Another Dive into PCA in a Practical View

栏目: IT技术 · 发布时间: 4年前

内容简介:PCA is commonly addressed in interviews; so it’s important to understand the definition of PCA:The orthogonal projection of the data into a

Another Dive into PCA in a Practical View

Understanding how to use PCA for applications as a machine learning practitioner

T here are thousands of articles on Towards Data Science on PCA related topics. Well, I am contributing my view to the thousands. However, I am trying my best to explain PCA in a code-first practical manner that may change your view on PCA. No matter if you are a beginner or PCA master, I am sure you will find this blog refreshing and useful. Principal Component Analysis, or PCA, is a powerful tool that’s widely used for data science applications, such as dimension reduction , feature extraction , and data visualization .

PCA is commonly addressed in interviews; so it’s important to understand the definition of PCA:

The orthogonal projection of the data into a lower dimension linear space( principal subspace ), such that the variance of the projected data is maximized. — Hotelling, 1933

This formal definition explained the the essence of PCA: variance maximization in lower dimensions . Let’s dive into the details of PCA by using code examples.

Get Started

To get started with this post and the following examples, you will need:

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from matplotlib import image%matplotlib inline
plt.style.use('ggplot')

We will dig into three examples of PCA:

  • with a mock dataset
  • with the IRIS dataset
  • with my IMAGE

I am sure after these three examples, you will understand PCA and know how to use it properly in applications.

With a mock dataset

Before jumping into the code, we can simplify PCA to this problem: we have some data x , we want to find z = Wx , such that dim(z) < dim(x) . But how do we find W ? Well, it’s pretty hard to explain this in high dimension. We can consider a very simple example in 2D: we have a vector x in 2D space, and we have to find W such that z covers the most variance.

As you may remember from your linear algebra class, dot product works as follows:

we can see that if we want z to cover more variance , we need to rotate W such that it follows the direction of x . We will explain this in the mock example. Let’s start with the code.

x = np.arange(1,10)
y = 2*x + np.random.rand(9) * 2
plt.scatter(x,y)
plt.xlabel('v1')
plt.ylabel('v2')
plt.xlabel('experience(year)') 
plt.ylabel('salary(k)')

As we can see, we have some data x , we want to find W such that z=Wx covers the most variance . We can see that the variance covered in direction 2 is much larger than in direction1 if you project each point on the line. Direction 1 and direction 2 refer to the choice of directions of W . However, this is not the W that will produce the first principal component. Why? WE NEED TO STANDARDIZE FIRST! Remember, before you perform PCA, make sure you scale your data such that each column has a standard deviation of 1 and mean of 0. Let’s generate W using the following code:

df = pd.DataFrame({'v1': x, 'v2':y})
df = StandardScaler().fit_transform(df)
df = pd.DataFrame(df,columns=['x','y'])
plt.scatter(df.x, df.y)

Standardize the data using StandardScaler from sklearn.preprocessing. Now we can start finding W:

pca = PCA(n_components=1)
pca.fit(df)
pc1 = pca.transform(df)
pc1 
# array([[-2.1107132 ],
#       [-1.66336608],
#       [-1.04335203],
#       [-0.53589344],
#       [-0.0428135 ],
#       [ 0.39641593],
#       [ 1.01178638],
#       [ 1.67023775],
#       [ 2.3176982 ]])
inverse = pca.inverse_transform(pc1)
inverse = pd.DataFrame(inverse, columns=['x','y'])
plt.scatter(df.x, df.y, label='standardized x')
plt.plot(inverse.x, inverse.y,'b',label='w')
plt.legend()

We got W by inverse transforming principal component one( z ). Mathematically, we usually find W by using a Lagrange multiplier. I am not going to dig into the math, but the solution to finding the max of var( Z ) given x is: w is the eigenvector of the covariance matrix of x corresponding to the i^th largest eigenvalue. By the way, you can also solve it by gradient descent .

To illustrate the process of finding W, I’d like to include a beautiful gif:

This is a 2D illustration of PCA, the rotating black line is W. I’d like to include a 3D illustration as well:

We can see that the red line is the W that produces PC1, green — PC2, blue — PC3.

With the IRIS dataset

Iris dataset is a very famous dataset in the machine learning community; it consists of 4 features and 1 target. However, to visualize all the points in 4-D is quite difficult. Instead, we can use PCA to project all the points in a 4-D space into a 2-D space and visualize it easily. First, load the dataset:

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
df = pd.read_csv(url, names=['v1','v2','v3','petal width','target'])

Next, we need to STANDARDIZE as I mentioned above. Notice that we only need to standardize the features here.

x = StandardScaler().fit_transform(df.iloc[:,:-1])

Find pc1 and pc2 :

pca = PCA(n_components=2) 
pca.fit(x)
pcs = pca.transform(x)
pcs = pd.DataFrame(pcs, columns=['pc1','pc2'])
pcs['target'] = df.target #define the target for each set of pcs

Let’s now visualize it in 2-D:

plt.figure(figsize=(14,9))
sns.scatterplot(pcs.pc1, pcs.pc2, hue=pcs.target)

We can see that the each species can still be separated well when projecting into a 2-D space. Another finding is that red(Iris-setosa) differs from the other two species a lot. The takeaway is that PCA can be a tool for data visualization as well though not common.

With my Image

Yes, my image. I have a British short hair who would love to be the model for this post. “It’s me guys! What’s up? XD. My name is Grey because I am.”

Mr. Grey, Ovuvuẹnvuẹnvuẹn Eyẹntuẹnwẹnvuẹn Ugbẹn’ugbẹn Osas

So we will PCA transform Mr. Grey today. PCA is pretty popular in image decomposition nowadays as it’s a variance maximization method. With the power of PCA, we can decompose pictures really easily so that it can contain the most information with the least memory required. Let’s jump straight into it.

image = image.imread('mycat.png')
plt.imshow(image)
plt.title('My Cat')
shape = image.shape
Still cute in Python Matplotlib
def image_decomposition(image, n_components):
    pca = PCA(n_components=n_components, svd_solver='randomized')
    data = image.reshape(image.shape[0],-1)
    data = pca.fit_transform(data)
    print(f'With {n_components} principal components you explained is:{pca.explained_variance_ratio_.sum()}')
    temp = pca.inverse_transform(data)
    temp = temp.reshape(shape)
    plt.imshow(temp)
    plt.title(f'My cat with only {n_components} principal components')image_decomposition(image, 64)
#With 64 principal components you explained is:0.994881272315979
Remains my cuteness with 64 pcs

Mr. Grey still looks like Mr. Grey with 64 principal components.

image_decomposition(image, 32)
#With 32 principal components you explained is:0.9856431484222412
EHH still rocking my baby face
image_decomposition(image, 16)
#With 16 principal components you explained is:0.9685052037239075
Losing it…
image_decomposition(image, 4)
#With 4 principal components you explained is:0.8802358508110046
Oh well…

Summary

PCA is basically linear algebra and it’s a hot topic in interviews. From this blog, you should be able to answer: 1. How does PCA work? 2. How to find W such that the variance of z is maximized. 3. How to project a high-dimensional data into a lower dimensional space? 4. How to use PCA on Mr. Grey or your images :D.


以上所述就是小编给大家介绍的《Another Dive into PCA in a Practical View》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

C#图解教程

C#图解教程

索利斯 / 苏林、朱晔 / 人民邮电出版社 / 2009-1 / 65.00元

本书是一本广受赞誉的C# 教程。它以图文并茂的形式,用朴实简洁的文字,并辅之以大量表格和代码示例,精炼而全面地阐述了最新版C# 语言的各种特性,使读者能够快速理解、学习和使用C#。同时, 本书还讲解了C#与VB 、C++ 等主流语言的不同点和相似之处。 本书是一本经典的C# 入门书,不仅适合没有任何编程语言基础的初级读者,而且还是有VB 、C++ 等语言基础的C# 初学者的最佳选择。一起来看看 《C#图解教程》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

MD5 加密
MD5 加密

MD5 加密工具

SHA 加密
SHA 加密

SHA 加密工具