Another Dive into PCA in a Practical View

栏目: IT技术 · 发布时间: 5年前

内容简介:PCA is commonly addressed in interviews; so it’s important to understand the definition of PCA:The orthogonal projection of the data into a

Another Dive into PCA in a Practical View

Understanding how to use PCA for applications as a machine learning practitioner

T here are thousands of articles on Towards Data Science on PCA related topics. Well, I am contributing my view to the thousands. However, I am trying my best to explain PCA in a code-first practical manner that may change your view on PCA. No matter if you are a beginner or PCA master, I am sure you will find this blog refreshing and useful. Principal Component Analysis, or PCA, is a powerful tool that’s widely used for data science applications, such as dimension reduction , feature extraction , and data visualization .

PCA is commonly addressed in interviews; so it’s important to understand the definition of PCA:

The orthogonal projection of the data into a lower dimension linear space( principal subspace ), such that the variance of the projected data is maximized. — Hotelling, 1933

This formal definition explained the the essence of PCA: variance maximization in lower dimensions . Let’s dive into the details of PCA by using code examples.

Get Started

To get started with this post and the following examples, you will need:

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from matplotlib import image%matplotlib inline
plt.style.use('ggplot')

We will dig into three examples of PCA:

  • with a mock dataset
  • with the IRIS dataset
  • with my IMAGE

I am sure after these three examples, you will understand PCA and know how to use it properly in applications.

With a mock dataset

Before jumping into the code, we can simplify PCA to this problem: we have some data x , we want to find z = Wx , such that dim(z) < dim(x) . But how do we find W ? Well, it’s pretty hard to explain this in high dimension. We can consider a very simple example in 2D: we have a vector x in 2D space, and we have to find W such that z covers the most variance.

As you may remember from your linear algebra class, dot product works as follows:

we can see that if we want z to cover more variance , we need to rotate W such that it follows the direction of x . We will explain this in the mock example. Let’s start with the code.

x = np.arange(1,10)
y = 2*x + np.random.rand(9) * 2
plt.scatter(x,y)
plt.xlabel('v1')
plt.ylabel('v2')
plt.xlabel('experience(year)') 
plt.ylabel('salary(k)')

As we can see, we have some data x , we want to find W such that z=Wx covers the most variance . We can see that the variance covered in direction 2 is much larger than in direction1 if you project each point on the line. Direction 1 and direction 2 refer to the choice of directions of W . However, this is not the W that will produce the first principal component. Why? WE NEED TO STANDARDIZE FIRST! Remember, before you perform PCA, make sure you scale your data such that each column has a standard deviation of 1 and mean of 0. Let’s generate W using the following code:

df = pd.DataFrame({'v1': x, 'v2':y})
df = StandardScaler().fit_transform(df)
df = pd.DataFrame(df,columns=['x','y'])
plt.scatter(df.x, df.y)

Standardize the data using StandardScaler from sklearn.preprocessing. Now we can start finding W:

pca = PCA(n_components=1)
pca.fit(df)
pc1 = pca.transform(df)
pc1 
# array([[-2.1107132 ],
#       [-1.66336608],
#       [-1.04335203],
#       [-0.53589344],
#       [-0.0428135 ],
#       [ 0.39641593],
#       [ 1.01178638],
#       [ 1.67023775],
#       [ 2.3176982 ]])
inverse = pca.inverse_transform(pc1)
inverse = pd.DataFrame(inverse, columns=['x','y'])
plt.scatter(df.x, df.y, label='standardized x')
plt.plot(inverse.x, inverse.y,'b',label='w')
plt.legend()

We got W by inverse transforming principal component one( z ). Mathematically, we usually find W by using a Lagrange multiplier. I am not going to dig into the math, but the solution to finding the max of var( Z ) given x is: w is the eigenvector of the covariance matrix of x corresponding to the i^th largest eigenvalue. By the way, you can also solve it by gradient descent .

To illustrate the process of finding W, I’d like to include a beautiful gif:

This is a 2D illustration of PCA, the rotating black line is W. I’d like to include a 3D illustration as well:

We can see that the red line is the W that produces PC1, green — PC2, blue — PC3.

With the IRIS dataset

Iris dataset is a very famous dataset in the machine learning community; it consists of 4 features and 1 target. However, to visualize all the points in 4-D is quite difficult. Instead, we can use PCA to project all the points in a 4-D space into a 2-D space and visualize it easily. First, load the dataset:

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
df = pd.read_csv(url, names=['v1','v2','v3','petal width','target'])

Next, we need to STANDARDIZE as I mentioned above. Notice that we only need to standardize the features here.

x = StandardScaler().fit_transform(df.iloc[:,:-1])

Find pc1 and pc2 :

pca = PCA(n_components=2) 
pca.fit(x)
pcs = pca.transform(x)
pcs = pd.DataFrame(pcs, columns=['pc1','pc2'])
pcs['target'] = df.target #define the target for each set of pcs

Let’s now visualize it in 2-D:

plt.figure(figsize=(14,9))
sns.scatterplot(pcs.pc1, pcs.pc2, hue=pcs.target)

We can see that the each species can still be separated well when projecting into a 2-D space. Another finding is that red(Iris-setosa) differs from the other two species a lot. The takeaway is that PCA can be a tool for data visualization as well though not common.

With my Image

Yes, my image. I have a British short hair who would love to be the model for this post. “It’s me guys! What’s up? XD. My name is Grey because I am.”

Mr. Grey, Ovuvuẹnvuẹnvuẹn Eyẹntuẹnwẹnvuẹn Ugbẹn’ugbẹn Osas

So we will PCA transform Mr. Grey today. PCA is pretty popular in image decomposition nowadays as it’s a variance maximization method. With the power of PCA, we can decompose pictures really easily so that it can contain the most information with the least memory required. Let’s jump straight into it.

image = image.imread('mycat.png')
plt.imshow(image)
plt.title('My Cat')
shape = image.shape
Still cute in Python Matplotlib
def image_decomposition(image, n_components):
    pca = PCA(n_components=n_components, svd_solver='randomized')
    data = image.reshape(image.shape[0],-1)
    data = pca.fit_transform(data)
    print(f'With {n_components} principal components you explained is:{pca.explained_variance_ratio_.sum()}')
    temp = pca.inverse_transform(data)
    temp = temp.reshape(shape)
    plt.imshow(temp)
    plt.title(f'My cat with only {n_components} principal components')image_decomposition(image, 64)
#With 64 principal components you explained is:0.994881272315979
Remains my cuteness with 64 pcs

Mr. Grey still looks like Mr. Grey with 64 principal components.

image_decomposition(image, 32)
#With 32 principal components you explained is:0.9856431484222412
EHH still rocking my baby face
image_decomposition(image, 16)
#With 16 principal components you explained is:0.9685052037239075
Losing it…
image_decomposition(image, 4)
#With 4 principal components you explained is:0.8802358508110046
Oh well…

Summary

PCA is basically linear algebra and it’s a hot topic in interviews. From this blog, you should be able to answer: 1. How does PCA work? 2. How to find W such that the variance of z is maximized. 3. How to project a high-dimensional data into a lower dimensional space? 4. How to use PCA on Mr. Grey or your images :D.


以上所述就是小编给大家介绍的《Another Dive into PCA in a Practical View》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Facebook效应

Facebook效应

[美] 大卫·柯克帕特里克 / 沈路、梁军、崔筝 / 华文出版社 / 2010-10 / 49.80

本书作者近距离地采访了与Facebook相关的人士,其中包括Facebook的创始人、员工、投资人、意向投资人以及合作伙伴,加起来超过了130人。这是真切详实的访谈,更是超级精彩的故事。作者以其细腻的笔触,精巧的叙事结构,解密了Facebook如何从哈佛的宿舍里萌发,创始人的内讧,权力之争,如何放弃华盛顿邮报的投资,怎样争取到第一个广告客户,而第一轮融资又如何获得一亿美元的估值,让人痴迷的图片产品......一起来看看 《Facebook效应》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

html转js在线工具
html转js在线工具

html转js在线工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具