Dimensionality Reduction: PCA versus Autoencoders

栏目: IT技术 · 发布时间: 4年前

Comparison of PCA and AutoEcoders for Dimensionality Reduction

Jun 18 ·8min read

Dimensionality Reduction: PCA versus Autoencoders — Picture by Billy Huynh on Unsplash

Dimensionality reduction is a technique of reducing the feature space to obtain a stable and statistically sound machine learning model avoiding the Curse of dimensionality . There are mainly two approaches to perform dimensionality reduction: Feature Selection and Feature Transformation .

Feature Selection approach tries to subset important features and remove collinear or not-so-important features. One can read more about it, here .

Feature Transformation also is known as Feature Extraction tries to project the high-dimensional data into lower dimensions. Some Feature Transformation techniques are PCA , Matrix-Factorisation , Autoencoders , t-Sne , UMAP , etc.

Through this blog post, I intend to do a deep dive into PCA and Autoencoders. We will see the advantages and shortcomings of both the techniques and an interesting example to clearly understand it. The complete source code of the solution can be found here .

Principle Component Analysis

Principle Component Analysis is an unsupervised technique where the original data is projected to the direction of high variance. These directions of high variance are orthogonal to each other resulting in very low or almost close to 0 correlation in the projected data. These features transformation is linear and the methodology to do it is:

Step 1: Calculate the Correlation matrix data consisting of n dimensions. The Correlation matrix will be of shape n*n.

Step 2: Calculate the Eigenvectors and Eigenvalues of this matrix.

Step 3: Take the first k-eigenvectors with the highest eigenvalues.

Step 4: Project the original dataset into these k eigenvectors resulting in k dimensions where k ≤ n.

Autoencoders

Autoencoder is an unsupervised artificial neural network that compresses the data to lower dimension and then reconstructs the input back. Autoencoder finds the representation of the data in a lower dimension by focusing more on the important features getting rid of noise and redundancy. It's based on Encoder-Decoder architecture, where encoder encodes the high-dimensional data to lower-dimension and decoder takes the lower-dimensional data and tries to reconstruct the original high-dimensional data.

In the above Diagram, X is the input data, z is the lower-dimension representation of input X and X’ is the reconstructed input data. The mapping of higher to lower dimensions can be linear or non-linear depending on the choice of the activation function .

Compaision : PCA versus AutoEncoders

PCA is a linear transformation of data while AE can be linear or non-linear depending on the choice of the activation function.
PCA is pretty fast as there exist algorithms that can fast calculate it while AE trains through Gradient descent and is slower comparatively.
PCA projects data into dimensions that are orthogonal to each other resulting in very low or close to zero correlation in the projected data. AE transformed data doesn't guarantee that because the way it’s trained is merely to minimize the reconstruction loss.
PCA is a simple linear transformation on the input space to directions of maximum variation while AE is a more sophisticated and complex technique that can model relatively complex relationships and non-linearities.
One rule of thumb could be the size of Data. Go with PCA for small datasets and AE for comparatively larger datasets.
PCA hyperparameter is ‘k’ i.e. number of orthogonal dimensions to project data into while for AE it is the architecture of the neural network.
AE with a single layer and linear activation has similar performance as PCA. AE with multiple layers and non-activation function termed as Deep Autoencoder is prone to overfitting and can be controlled by Regularisation and careful designing. Please refer to below two blogs to learn more about it.

Building Autoencoders in Keras

In this tutorial, we will answer some common questions about autoencoders, and we will cover code examples of the…

blog.keras.io

Different types of Autoencoders

Reading time: 30 minutes An autoencoder is a type of artificial neural network used to learn efficient data codings in…

iq.opengenus.org

Image Data Example: Understanding PCA and Autoencoders

Let’s take the below image to perform dimensionality reduction using the two methods.

The image is of dimension 360 * 460. Another way to look at it is as a dataset with 360 data points and 460 features/dimensions.

We will try to reduce the dimensions from 460 to just 10% i.e. 46 dimensions, first using PCA and then AE. Let’s see the difference in reconstruction and other properties.

Using PCA for Dimensionality Reduction

pct_reduction = 0.10
reduced_pixel  = int( pct_reduction* original_dimensions[1])#Applying PCA
pca = PCA(n_components=reduced_pixel)
pca.fit(image_matrix)#Transforming the input matrix
X_transformed = pca.transform(image_matrix)
print("Original Input dimesnions {}".format(original_dimensions))
print("New Reduced dimensions {}".format(X_transformed.shape))

Output

Original Input dimesnions (360, 460)
New Reduced dimensions (360, 46)

Let’s check the correlation of the new transformed features coming out of PCA.

df_pca = pd.DataFrame(data = X_transformed,columns=list(range(X_transformed.shape[1])))figure = plt.figure(figsize=(10,6))
corrMatrix = df_pca.corr()
sns.heatmap(corrMatrix, annot=False)
plt.show()

The correlation matrix shows the new transformed features are uncorrelated to one another with 0 correlation. The reason being the projection of data into orthogonal dimensions in PCA.

Next, we will try to reconstruct back the original data only through the information from reduced feature space available to us.

reconstructed_matrix = pca.inverse_transform(X_transformed)
reconstructed_image_pca = Image.fromarray(np.uint8(reconstructed_matrix))
plt.figure(figsize=(8,12))
plt.imshow(reconstructed_image_pca,cmap = plt.cm.gray)

Calculating the RMSE of the reconstructed image

def my_rmse(np_arr1,np_arr2):
    dim = np_arr1.shape
    tot_loss = 0
    for i in range(dim[0]):
        for j in range(dim[1]):
            tot_loss += math.pow((np_arr1[i,j] - np_arr2[i,j]),2)
    return round(math.sqrt(tot_loss/(dim[0]* dim[1]*1.0)),2)error_pca = my_rmse(image_matrix,reconstructed_matrix)

The RMSE is 11.84 (Lower the better).

If there is no difference between the original and reconstructed image the RMSE will be 0. If around 120 dimensions are used coming out of PCA, the RMSE is close to 0.

Using Single-layer Autoencoders with Linear Activation for Dimensionality Reduction

# Standarise the Data
X_org = image_matrix.copy()
sc = StandardScaler()
X = sc.fit_transform(X_org)# this is the size of our encoded representations
encoding_dim = reduced_pixel # this is our input placeholder
input_img = Input(shape=(img.width,))# "encoded" is the encoded representation of the input
encoded = Dense(encoding_dim, activation='linear')(input_img)# "decoded" is the lossy reconstruction of the input
decoded = Dense(img.width, activation=None)(encoded)# this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)#Encoder
encoder = Model(input_img, encoded)# create a placeholder for an encoded (32-dimensional) input
encoded_input = Input(shape=(encoding_dim,))# retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]# create the decoder model
decoder = Model(encoded_input, decoder_layer(encoded_input))autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')autoencoder.fit(X, X,
                epochs=500,
                batch_size=16,
                shuffle=True)encoded_imgs = encoder.predict(X)
decoded_imgs = decoder.predict(encoded_imgs)

Let’s check the correlation of the new transformed features coming out of AE.

df_ae = pd.DataFrame(data = encoded_imgs,columns=list(range(encoded_imgs.shape[1])))
figure = plt.figure(figsize=(10,6))
corrMatrix = df_ae.corr()
sns.heatmap(corrMatrix, annot=False)
plt.show()

The correlation matrix shows the new transformed features are somewhat correlated. The Pearson correlation factor deviates a lot from 0. The reason being AE training is to merely minimize the reconstruction loss.

Next, we will try to reconstruct back the original data only through the reduced feature space available to us.

X_decoded_ae = sc.inverse_transform(decoded_imgs)reconstructed_image_ae = Image.fromarray(np.uint8(X_decoded_ae))
plt.figure(figsize=(8,12))
plt.imshow(reconstructed_image_ae,cmap = plt.cm.gray)

Calculating the RMSE of the reconstructed image.

error_ae = my_rmse(image_matrix,X_decoded_ae)

The RMSE is 12.15. It's close to PCA’s RMSE of 11.84. Autoencoder with a single layer and linear activation performs similar to PCA.

Using Two-layer Autoencoders with Non-Linear Activation for Dimensionality Reduction

input_img = Input(shape=(img.width,))
encoded1 = Dense(128, activation='relu')(input_img)
encoded2 = Dense(reduced_pixel, activation='relu')(encoded1)
decoded1 = Dense(128, activation='relu')(encoded2)
decoded2 = Dense(img.width, activation=None)(decoded1)autoencoder = Model(input_img, decoded2)
autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')autoencoder.fit(X,X,
                epochs=500,
                batch_size=16,
                shuffle=True)
# Encoder
encoder = Model(input_img, encoded2)
# Decoder
decoder = Model(input_img, decoded2)encoded_imgs = encoder.predict(X)
decoded_imgs = decoder.predict(X)

Next, we will try to reconstruct back the original data only through the reduced feature space available to us.

X_decoded_deep_ae = sc.inverse_transform(decoded_imgs)reconstructed_image_deep_ae = Image.fromarray(np.uint8(X_decoded_deep_ae))
plt.figure(figsize=(8,12))
plt.imshow(reconstructed_image_deep_ae,cmap = plt.cm.gray)

Calculating the RMSE of the reconstructed image.

error_dae = my_rmse(image_matrix,X_decoded_deep_ae)

The RMSE is 8.57. Gain over PCA is 28 % with the same number of reduced dimensions.

Autoencoder with an extra layer with non-linear activation is able to capture non-linearity in the image better. It is able to capture complex patterns and also sudden changes in pixel values better than PCA. Though it comes with a cost of relatively higher training time and resources.

Conclusion

Through this blog post, we did a deep dive into PCA and Autoencoders. We also saw the advantages and shortcomings of both techniques. The concepts were tried on an image dataset where an Autoencoder with an extra layer of non-linear activation outperformed PCA though at the cost of higher training time and resources. The complete source code of the solution can be found here .

If you have any doubts or queries, do reach out to me. I will be interested to know if you faced the problem of high dimensionality and which approaches you tried approaches to overcome it.

About the author-:

Abhishek Mungoli is a seasoned Data Scientist with experience in ML field and Computer Science background, spanning over various domains and problem-solving mindset. Excelled in various Machine learning and Optimization problems specific to Retail. Enthusiastic about implementing Machine Learning models at scale and knowledge sharing via blogs, talks, meetups, and papers, etc.

My motive always is to simplify the toughest of the things to its most simplified version. I love problem-solving, data science, product development, and scaling solutions. I love to explore new places and working out in my leisure time. Follow me on Medium , Linkedin or Instagram and check out my previous posts . I welcome feedback and constructive criticism. Some of my blogs -

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Dimensionality Reduction: PCA versus Autoencoders

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

How Great Decisions Get Made

Maruska, Don / 2006-2 / $ 20.28

All too often, solving tough work issues can become a tug of war as clashing departments, priorities, personality styles, and other concerns threaten to destroy any possibility of a successful conclus......一起来看看《How Great Decisions Get Made》这本书的介绍吧!

码农工具