内容简介:PCA is commonly addressed in interviews; so it’s important to understand the definition of PCA:The orthogonal projection of the data into a
Another Dive into PCA in a Practical View
Understanding how to use PCA for applications as a machine learning practitioner
T here are thousands of articles on Towards Data Science on PCA related topics. Well, I am contributing my view to the thousands. However, I am trying my best to explain PCA in a code-first practical manner that may change your view on PCA. No matter if you are a beginner or PCA master, I am sure you will find this blog refreshing and useful. Principal Component Analysis, or PCA, is a powerful tool that’s widely used for data science applications, such as dimension reduction , feature extraction , and data visualization .
PCA is commonly addressed in interviews; so it’s important to understand the definition of PCA:
The orthogonal projection of the data into a lower dimension linear space( principal subspace ), such that the variance of the projected data is maximized. — Hotelling, 1933
This formal definition explained the the essence of PCA: variance maximization in lower dimensions . Let’s dive into the details of PCA by using code examples.
Get Started
To get started with this post and the following examples, you will need:
import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import pandas as pd import seaborn as sns from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA from matplotlib import image%matplotlib inline plt.style.use('ggplot')
We will dig into three examples of PCA:
- with a mock dataset
- with the IRIS dataset
- with my IMAGE
I am sure after these three examples, you will understand PCA and know how to use it properly in applications.
With a mock dataset
Before jumping into the code, we can simplify PCA to this problem: we have some data x , we want to find z = Wx , such that dim(z) < dim(x) . But how do we find W ? Well, it’s pretty hard to explain this in high dimension. We can consider a very simple example in 2D: we have a vector x in 2D space, and we have to find W such that z covers the most variance.
As you may remember from your linear algebra class, dot product works as follows:
we can see that if we want z to cover more variance , we need to rotate W such that it follows the direction of x . We will explain this in the mock example. Let’s start with the code.
x = np.arange(1,10) y = 2*x + np.random.rand(9) * 2 plt.scatter(x,y) plt.xlabel('v1') plt.ylabel('v2') plt.xlabel('experience(year)') plt.ylabel('salary(k)')
As we can see, we have some data x , we want to find W such that z=Wx covers the most variance . We can see that the variance covered in direction 2 is much larger than in direction1 if you project each point on the line. Direction 1 and direction 2 refer to the choice of directions of W . However, this is not the W that will produce the first principal component. Why? WE NEED TO STANDARDIZE FIRST! Remember, before you perform PCA, make sure you scale your data such that each column has a standard deviation of 1 and mean of 0. Let’s generate W using the following code:
df = pd.DataFrame({'v1': x, 'v2':y}) df = StandardScaler().fit_transform(df) df = pd.DataFrame(df,columns=['x','y']) plt.scatter(df.x, df.y)
Standardize the data using StandardScaler from sklearn.preprocessing. Now we can start finding W:
pca = PCA(n_components=1) pca.fit(df) pc1 = pca.transform(df) pc1 # array([[-2.1107132 ], # [-1.66336608], # [-1.04335203], # [-0.53589344], # [-0.0428135 ], # [ 0.39641593], # [ 1.01178638], # [ 1.67023775], # [ 2.3176982 ]]) inverse = pca.inverse_transform(pc1) inverse = pd.DataFrame(inverse, columns=['x','y']) plt.scatter(df.x, df.y, label='standardized x') plt.plot(inverse.x, inverse.y,'b',label='w') plt.legend()
We got W by inverse transforming principal component one( z ). Mathematically, we usually find W by using a Lagrange multiplier. I am not going to dig into the math, but the solution to finding the max of var( Z ) given x is: w is the eigenvector of the covariance matrix of x corresponding to the i^th largest eigenvalue. By the way, you can also solve it by gradient descent .
To illustrate the process of finding W, I’d like to include a beautiful gif:
This is a 2D illustration of PCA, the rotating black line is W. I’d like to include a 3D illustration as well:
We can see that the red line is the W that produces PC1, green — PC2, blue — PC3.
With the IRIS dataset
Iris dataset is a very famous dataset in the machine learning community; it consists of 4 features and 1 target. However, to visualize all the points in 4-D is quite difficult. Instead, we can use PCA to project all the points in a 4-D space into a 2-D space and visualize it easily. First, load the dataset:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
df = pd.read_csv(url, names=['v1','v2','v3','petal width','target'])
Next, we need to STANDARDIZE as I mentioned above. Notice that we only need to standardize the features here.
x = StandardScaler().fit_transform(df.iloc[:,:-1])
Find pc1 and pc2 :
pca = PCA(n_components=2) pca.fit(x) pcs = pca.transform(x) pcs = pd.DataFrame(pcs, columns=['pc1','pc2']) pcs['target'] = df.target #define the target for each set of pcs
Let’s now visualize it in 2-D:
plt.figure(figsize=(14,9)) sns.scatterplot(pcs.pc1, pcs.pc2, hue=pcs.target)
We can see that the each species can still be separated well when projecting into a 2-D space. Another finding is that red(Iris-setosa) differs from the other two species a lot. The takeaway is that PCA can be a tool for data visualization as well though not common.
With my Image
Yes, my image. I have a British short hair who would love to be the model for this post. “It’s me guys! What’s up? XD. My name is Grey because I am.”
So we will PCA transform Mr. Grey today. PCA is pretty popular in image decomposition nowadays as it’s a variance maximization method. With the power of PCA, we can decompose pictures really easily so that it can contain the most information with the least memory required. Let’s jump straight into it.
image = image.imread('mycat.png') plt.imshow(image) plt.title('My Cat') shape = image.shape
def image_decomposition(image, n_components): pca = PCA(n_components=n_components, svd_solver='randomized') data = image.reshape(image.shape[0],-1) data = pca.fit_transform(data) print(f'With {n_components} principal components you explained is:{pca.explained_variance_ratio_.sum()}') temp = pca.inverse_transform(data) temp = temp.reshape(shape) plt.imshow(temp) plt.title(f'My cat with only {n_components} principal components')image_decomposition(image, 64) #With 64 principal components you explained is:0.994881272315979
Mr. Grey still looks like Mr. Grey with 64 principal components.
image_decomposition(image, 32) #With 32 principal components you explained is:0.9856431484222412
image_decomposition(image, 16) #With 16 principal components you explained is:0.9685052037239075
image_decomposition(image, 4) #With 4 principal components you explained is:0.8802358508110046
Summary
PCA is basically linear algebra and it’s a hot topic in interviews. From this blog, you should be able to answer: 1. How does PCA work? 2. How to find W such that the variance of z is maximized. 3. How to project a high-dimensional data into a lower dimensional space? 4. How to use PCA on Mr. Grey or your images :D.
以上所述就是小编给大家介绍的《Another Dive into PCA in a Practical View》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。