Comparison of PCA and AutoEcoders for Dimensionality Reduction
Jun 18 ·8min read
Dimensionality reduction is a technique of reducing the feature space to obtain a stable and statistically sound machine learning model avoiding the Curse of dimensionality . There are mainly two approaches to perform dimensionality reduction: Feature Selection and Feature Transformation .
Feature Selection approach tries to subset important features and remove collinear or not-so-important features. One can read more about it, here .
Feature Transformation also is known as Feature Extraction tries to project the high-dimensional data into lower dimensions. Some Feature Transformation techniques are PCA , Matrix-Factorisation , Autoencoders , t-Sne , UMAP , etc.
Through this blog post, I intend to do a deep dive into PCA and Autoencoders. We will see the advantages and shortcomings of both the techniques and an interesting example to clearly understand it. The complete source code of the solution can be found here .
Principle Component Analysis
Principle Component Analysis is an unsupervised technique where the original data is projected to the direction of high variance. These directions of high variance are orthogonal to each other resulting in very low or almost close to 0 correlation in the projected data. These features transformation is linear and the methodology to do it is:
Step 1: Calculate the Correlation matrix data consisting of n dimensions. The Correlation matrix will be of shape n*n.
Step 2: Calculate the Eigenvectors and Eigenvalues of this matrix.
Step 3: Take the first k-eigenvectors with the highest eigenvalues.
Step 4: Project the original dataset into these k eigenvectors resulting in k dimensions where k ≤ n.
Autoencoders
Autoencoder is an unsupervised artificial neural network that compresses the data to lower dimension and then reconstructs the input back. Autoencoder finds the representation of the data in a lower dimension by focusing more on the important features getting rid of noise and redundancy. It's based on Encoder-Decoder architecture, where encoder encodes the high-dimensional data to lower-dimension and decoder takes the lower-dimensional data and tries to reconstruct the original high-dimensional data.
In the above Diagram, X is the input data, z is the lower-dimension representation of input X and X’ is the reconstructed input data. The mapping of higher to lower dimensions can be linear or non-linear depending on the choice of the activation function .
Compaision : PCA versus AutoEncoders
- PCA is a linear transformation of data while AE can be linear or non-linear depending on the choice of the activation function.
- PCA is pretty fast as there exist algorithms that can fast calculate it while AE trains through Gradient descent and is slower comparatively.
- PCA projects data into dimensions that are orthogonal to each other resulting in very low or close to zero correlation in the projected data. AE transformed data doesn't guarantee that because the way it’s trained is merely to minimize the reconstruction loss.
- PCA is a simple linear transformation on the input space to directions of maximum variation while AE is a more sophisticated and complex technique that can model relatively complex relationships and non-linearities.
- One rule of thumb could be the size of Data. Go with PCA for small datasets and AE for comparatively larger datasets.
- PCA hyperparameter is ‘k’ i.e. number of orthogonal dimensions to project data into while for AE it is the architecture of the neural network.
- AE with a single layer and linear activation has similar performance as PCA. AE with multiple layers and non-activation function termed as Deep Autoencoder is prone to overfitting and can be controlled by Regularisation and careful designing. Please refer to below two blogs to learn more about it.
Image Data Example: Understanding PCA and Autoencoders
Let’s take the below image to perform dimensionality reduction using the two methods.
The image is of dimension 360 * 460. Another way to look at it is as a dataset with 360 data points and 460 features/dimensions.
We will try to reduce the dimensions from 460 to just 10% i.e. 46 dimensions, first using PCA and then AE. Let’s see the difference in reconstruction and other properties.
Using PCA for Dimensionality Reduction
pct_reduction = 0.10 reduced_pixel = int( pct_reduction* original_dimensions[1])#Applying PCA pca = PCA(n_components=reduced_pixel) pca.fit(image_matrix)#Transforming the input matrix X_transformed = pca.transform(image_matrix) print("Original Input dimesnions {}".format(original_dimensions)) print("New Reduced dimensions {}".format(X_transformed.shape))
Output
Original Input dimesnions (360, 460) New Reduced dimensions (360, 46)
Let’s check the correlation of the new transformed features coming out of PCA.
df_pca = pd.DataFrame(data = X_transformed,columns=list(range(X_transformed.shape[1])))figure = plt.figure(figsize=(10,6)) corrMatrix = df_pca.corr() sns.heatmap(corrMatrix, annot=False) plt.show()
The correlation matrix shows the new transformed features are uncorrelated to one another with 0 correlation. The reason being the projection of data into orthogonal dimensions in PCA.
Next, we will try to reconstruct back the original data only through the information from reduced feature space available to us.
reconstructed_matrix = pca.inverse_transform(X_transformed) reconstructed_image_pca = Image.fromarray(np.uint8(reconstructed_matrix)) plt.figure(figsize=(8,12)) plt.imshow(reconstructed_image_pca,cmap = plt.cm.gray)
Calculating the RMSE of the reconstructed image
def my_rmse(np_arr1,np_arr2): dim = np_arr1.shape tot_loss = 0 for i in range(dim[0]): for j in range(dim[1]): tot_loss += math.pow((np_arr1[i,j] - np_arr2[i,j]),2) return round(math.sqrt(tot_loss/(dim[0]* dim[1]*1.0)),2)error_pca = my_rmse(image_matrix,reconstructed_matrix)
The RMSE is 11.84 (Lower the better).
If there is no difference between the original and reconstructed image the RMSE will be 0. If around 120 dimensions are used coming out of PCA, the RMSE is close to 0.
Using Single-layer Autoencoders with Linear Activation for Dimensionality Reduction
# Standarise the Data X_org = image_matrix.copy() sc = StandardScaler() X = sc.fit_transform(X_org)# this is the size of our encoded representations encoding_dim = reduced_pixel # this is our input placeholder input_img = Input(shape=(img.width,))# "encoded" is the encoded representation of the input encoded = Dense(encoding_dim, activation='linear')(input_img)# "decoded" is the lossy reconstruction of the input decoded = Dense(img.width, activation=None)(encoded)# this model maps an input to its reconstruction autoencoder = Model(input_img, decoded)#Encoder encoder = Model(input_img, encoded)# create a placeholder for an encoded (32-dimensional) input encoded_input = Input(shape=(encoding_dim,))# retrieve the last layer of the autoencoder model decoder_layer = autoencoder.layers[-1]# create the decoder model decoder = Model(encoded_input, decoder_layer(encoded_input))autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')autoencoder.fit(X, X, epochs=500, batch_size=16, shuffle=True)encoded_imgs = encoder.predict(X) decoded_imgs = decoder.predict(encoded_imgs)
Let’s check the correlation of the new transformed features coming out of AE.
df_ae = pd.DataFrame(data = encoded_imgs,columns=list(range(encoded_imgs.shape[1]))) figure = plt.figure(figsize=(10,6)) corrMatrix = df_ae.corr() sns.heatmap(corrMatrix, annot=False) plt.show()
The correlation matrix shows the new transformed features are somewhat correlated. The Pearson correlation factor deviates a lot from 0. The reason being AE training is to merely minimize the reconstruction loss.
Next, we will try to reconstruct back the original data only through the reduced feature space available to us.
X_decoded_ae = sc.inverse_transform(decoded_imgs)reconstructed_image_ae = Image.fromarray(np.uint8(X_decoded_ae)) plt.figure(figsize=(8,12)) plt.imshow(reconstructed_image_ae,cmap = plt.cm.gray)
Calculating the RMSE of the reconstructed image.
error_ae = my_rmse(image_matrix,X_decoded_ae)
The RMSE is 12.15. It's close to PCA’s RMSE of 11.84. Autoencoder with a single layer and linear activation performs similar to PCA.
Using Two-layer Autoencoders with Non-Linear Activation for Dimensionality Reduction
input_img = Input(shape=(img.width,)) encoded1 = Dense(128, activation='relu')(input_img) encoded2 = Dense(reduced_pixel, activation='relu')(encoded1) decoded1 = Dense(128, activation='relu')(encoded2) decoded2 = Dense(img.width, activation=None)(decoded1)autoencoder = Model(input_img, decoded2) autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')autoencoder.fit(X,X, epochs=500, batch_size=16, shuffle=True) # Encoder encoder = Model(input_img, encoded2) # Decoder decoder = Model(input_img, decoded2)encoded_imgs = encoder.predict(X) decoded_imgs = decoder.predict(X)
Next, we will try to reconstruct back the original data only through the reduced feature space available to us.
X_decoded_deep_ae = sc.inverse_transform(decoded_imgs)reconstructed_image_deep_ae = Image.fromarray(np.uint8(X_decoded_deep_ae)) plt.figure(figsize=(8,12)) plt.imshow(reconstructed_image_deep_ae,cmap = plt.cm.gray)
Calculating the RMSE of the reconstructed image.
error_dae = my_rmse(image_matrix,X_decoded_deep_ae)
The RMSE is 8.57. Gain over PCA is 28 % with the same number of reduced dimensions.
Autoencoder with an extra layer with non-linear activation is able to capture non-linearity in the image better. It is able to capture complex patterns and also sudden changes in pixel values better than PCA. Though it comes with a cost of relatively higher training time and resources.
Conclusion
Through this blog post, we did a deep dive into PCA and Autoencoders. We also saw the advantages and shortcomings of both techniques. The concepts were tried on an image dataset where an Autoencoder with an extra layer of non-linear activation outperformed PCA though at the cost of higher training time and resources. The complete source code of the solution can be found here .
If you have any doubts or queries, do reach out to me. I will be interested to know if you faced the problem of high dimensionality and which approaches you tried approaches to overcome it.
About the author-:
Abhishek Mungoli is a seasoned Data Scientist with experience in ML field and Computer Science background, spanning over various domains and problem-solving mindset. Excelled in various Machine learning and Optimization problems specific to Retail. Enthusiastic about implementing Machine Learning models at scale and knowledge sharing via blogs, talks, meetups, and papers, etc.
My motive always is to simplify the toughest of the things to its most simplified version. I love problem-solving, data science, product development, and scaling solutions. I love to explore new places and working out in my leisure time. Follow me on Medium , Linkedin or Instagram and check out my previous posts . I welcome feedback and constructive criticism. Some of my blogs -
- Experience the power of the Genetic Algorithm
- 5 Mistakes every Data Scientist should avoid
- Decomposing Time Series in a simple & intuitive way
- How GPU Computing literally saved me at work?
- Information Theory & KL DivergencePart I andPart II
- Process Wikipedia Using Apache Spark to Create Spicy Hot Datasets
- A Semi-Supervised Embedding based Fuzzy Clustering
- Compare which Machine Learning Model performs Better
- Analyzing Fitbit Data to Demystify Bodily Pattern Changes Amid Pandemic Lockdown
- Myths and Reality around Correlation
- A Guide to Becoming Business-Oriented Data Scientist
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
An Introduction to Genetic Algorithms
Melanie Mitchell / MIT Press / 1998-2-6 / USD 45.00
Genetic algorithms have been used in science and engineering as adaptive algorithms for solving practical problems and as computational models of natural evolutionary systems. This brief, accessible i......一起来看看 《An Introduction to Genetic Algorithms》 这本书的介绍吧!