Data Science Interview Questions

栏目: IT技术 · 发布时间: 4年前

内容简介：A typical interview process for a data science position includes multiple rounds. Often, one of such rounds covers theoretical concepts, where the goal is to determine if the candidate knows the fundamentals of machine learning.In this post, I’d like to su

A typical interview process for a data science position includes multiple rounds. Often, one of such rounds covers theoretical concepts, where the goal is to determine if the candidate knows the fundamentals of machine learning.

In this post, I’d like to summarize all my interviewing experience — from both interviewing and being interviewed — and came up with a list of 160+ theoretical data science questions.

This includes the following topics:

Linear regression
Validation
Classification and logistic regression
Regularization
Decision trees
Random forest
Gradient boosting trees
Neural networks
Text classification
Clustering
Ranking: search and recommendation
Time series

The number of questions in this post might seem overwhelming — and it indeed is. Keep in mind that the interview flow is based on what the company needs and what you have worked with, so if you didn’t work with models in time series or computer vision, you shouldn’t get questions about them.

Important: don’t feel discouraged if you don’t know the answers to some of the interview questions. This is absolutely fine.

Finally, to make it simpler, I grouped the questions into three categories, based on difficulty:

:baby: easy

:woman:‍:mortar_board: medium

:man:‍:computer: expert

That’s, of course, subjective, and it’s based only on my personal opinion.

Let’s start!

Supervised machine learning

What is supervised machine learning? :baby:

Linear regression

What is regression? Which models can you use to solve a regression problem? :baby:
What is linear regression? When do we use it? :baby:
What’s the normal distribution? Why do we care about it? :baby:
How do we check if a variable follows the normal distribution? :woman:‍:mortar_board:
What if we want to build a model for predicting prices? Are prices distributed normally? Do we need to do any pre-processing for prices? :woman:‍:mortar_board:
What are the methods for solving linear regression do you know? :woman:‍:mortar_board:
What is gradient descent? How does it work? :woman:‍:mortar_board:
What is the normal equation? :woman:‍:mortar_board:
What is SGD — stochastic gradient descent? What’s the difference with the usual gradient descent? :woman:‍:mortar_board:
Which metrics for evaluating regression models do you know? :baby:
What are MSE and RMSE? :baby:

Validation

What is overfitting? :baby:
How to validate your models? :baby:
Why do we need to split our data into three parts: train, validation, and test? :baby:
Can you explain how cross-validation works? :baby:
What is K-fold cross-validation? :baby:
How do we choose K in K-fold cross-validation? What’s your favorite K? :baby:

Classification

What is classification? Which models would you use to solve a classification problem? :baby:
What is logistic regression? When do we need to use it? :baby:
Is logistic regression a linear model? Why? :baby:
What is sigmoid? What does it do? :baby:
How do we evaluate classification models? :baby:
What is accuracy? :baby:
Is accuracy always a good metric? :baby:
What is the confusion table? What are the cells in this table? :baby:
What is precision, recall, and F1-score? :baby:
Precision-recall trade-off :woman:‍:mortar_board:
What is the ROC curve? When to use it? :woman:‍:mortar_board:
What is AUC (AU ROC)? When to use it? :woman:‍:mortar_board:
How to interpret the AU ROC score? :woman:‍:mortar_board:
What is the PR (precision-recall) curve? :woman:‍:mortar_board:
What is the area under the PR curve? Is it a useful metric? :woman:‍:mortar_board:I
In which cases AU PR is better than AU ROC? :woman:‍:mortar_board:
What do we do with categorical variables? :woman:‍:mortar_board:
Why do we need one-hot encoding? :woman:‍:mortar_board:

Regularization

What happens to our linear regression model if we have three columns in our data: x, y, z — and z is a sum of x and y? :woman:‍:mortar_board:
What happens to our linear regression model if the column z in the data is a sum of columns x and y and some random noise? :woman:‍:mortar_board:
What is regularization? Why do we need it? :baby:
Which regularization techniques do you know? :woman:‍:mortar_board:
What kind of regularization techniques are applicable to linear models? :woman:‍:mortar_board:
How does L2 regularization look like in a linear model? :woman:‍:mortar_board:
How do we select the right regularization parameters? :baby:
What’s the effect of L2 regularization on the weights of a linear model? :woman:‍:mortar_board:
How L1 regularization looks like in a linear model? :woman:‍:mortar_board:
What’s the difference between L2 and L1 regularization? :woman:‍:mortar_board:
Can we have both L1 and L2 regularization components in a linear model? :woman:‍:mortar_board:
What’s the interpretation of the bias term in linear models? :woman:‍:mortar_board:
How do we interpret weights in linear models? :woman:‍:mortar_board:
If a weight for one variable is higher than for another — can we say that this variable is more important? :woman:‍:mortar_board:
When do we need to perform feature normalization for linear models? When it’s okay not to do it? :woman:‍:mortar_board:

Feature selection

What is feature selection? Why do we need it? :baby:
Is feature selection important for linear models? :woman:‍:mortar_board:
Which feature selection techniques do you know? :woman:‍:mortar_board:
Can we use L1 regularization for feature selection? :woman:‍:mortar_board:
Can we use L2 regularization for feature selection? :woman:‍:mortar_board:

Decision trees

What are the decision trees? :baby:
How do we train decision trees? :woman:‍:mortar_board:
What are the main parameters of the decision tree model? :baby:
How do we handle categorical variables in decision trees? :woman:‍:mortar_board:
What are the benefits of a single decision tree compared to more complex models? :woman:‍:mortar_board:
How can we know which features are more important for the decision tree model? :woman:‍:mortar_board:

Random forest

What is random forest? :baby:
Why do we need randomization in random forest? :woman:‍:mortar_board:
What are the main parameters of the random forest model? :woman:‍:mortar_board:
How do we select the depth of the trees in random forest? :woman:‍:mortar_board:
How do we know how many trees we need in random forest? :woman:‍:mortar_board:
Is it easy to parallelize training of a random forest model? How can we do it? :woman:‍:mortar_board:
What are the potential problems with many large trees? :woman:‍:mortar_board:
What if instead of finding the best split, we randomly select a few splits and just select the best from them. Will it work? :man:‍:computer:
What happens when we have correlated features in our data? :woman:‍:mortar_board:

Gradient boosting

What is gradient boosting trees? :woman:‍:mortar_board:
What’s the difference between random forest and gradient boosting? :woman:‍:mortar_board:
Is it possible to parallelize training of a gradient boosting model? How to do it? :woman:‍:mortar_board:
Feature importance in gradient boosting trees — what are possible options? :woman:‍:mortar_board:
Are there any differences between continuous and discrete variables when it comes to feature importance of gradient boosting models? :man:‍:computer:
What are the main parameters in the gradient boosting model? :woman:‍:mortar_board:
How do you approach tuning parameters in XGBoost or LightGBM? :man:‍:computer:
How do you select the number of trees in the gradient boosting model? :woman:‍:mortar_board:

Parameter tuning

Which parameter tuning strategies (in general) do you know? :woman:‍:mortar_board:
What’s the difference between grid search parameter tuning strategy and random search? When to use one or another? :woman:‍:mortar_board:

Neural networks

What kind of problems neural nets can solve? :baby:
How does a usual fully-connected feed-forward neural network work? :woman:‍:mortar_board:
Why do we need activation functions? :baby:
What are the problems with sigmoid as an activation function? :woman:‍:mortar_board:
What is ReLU? How is it better than sigmoid or tanh? :woman:‍:mortar_board:
How we can initialize the weights of a neural network? :woman:‍:mortar_board:
What if we set all the weights of a neural network to 0? :woman:‍:mortar_board:
What regularization techniques for neural nets do you know? :woman:‍:mortar_board:
What is dropout? Why is it useful? How does it work? :woman:‍:mortar_board:

Optimization in neural networks

What is backpropagation? How does it work? Why do we need it? :woman:‍:mortar_board:
Which optimization techniques for training neural nets do you know? :woman:‍:mortar_board:
How do we use SGD (stochastic gradient descent) for training a neural net? :woman:‍:mortar_board:
What’s the learning rate? :baby:
What happens when the learning rate is too large? Too small? :baby:
How to set the learning rate? :woman:‍:mortar_board:
What is Adam? What’s the main difference between Adam and SGD? :woman:‍:mortar_board:
When would you use Adam and when SGD? :woman:‍:mortar_board:
Do we want to have a constant learning rate or we better change it throughout training? :woman:‍:mortar_board:
How do we decide when to stop training a neural net? :baby:
What is model checkpointing? :woman:‍:mortar_board:
Can you tell us how you approach the model training process? :woman:‍:mortar_board:

Neural networks for computer vision

How we can use neural nets for computer vision? :woman:‍:mortar_board:
What’s a convolutional layer? :woman:‍:mortar_board:
Why do we actually need convolutions? Can’t we use fully-connected layers for that? :woman:‍:mortar_board:
What’s pooling in CNN? Why do we need it? :woman:‍:mortar_board:
How does max pooling work? Are there other pooling techniques? :woman:‍:mortar_board:
Are CNNs resistant to rotations? What happens to the predictions of a CNN if an image is rotated? :man:‍:computer:
What are augmentations? Why do we need them? :baby:What kind of augmentations do you know? :baby:How to choose which augmentations to use? :woman:‍:mortar_board:
What kind of CNN architectures for classification do you know? :man:‍:computer:
What is transfer learning? How does it work? :woman:‍:mortar_board:
What is object detection? Do you know any architectures for that? :man:‍:computer:
What is object segmentation? Do you know any architectures for that? :man:‍:computer:

Text classification

How can we use machine learning for text classification? :woman:‍:mortar_board:
What is bag of words? How we can use it for text classification? :woman:‍:mortar_board:
What are the advantages and disadvantages of bag of words? :woman:‍:mortar_board:
What are N-grams? How can we use them? :woman:‍:mortar_board:
How large should be N for our bag of words when using N-grams? :woman:‍:mortar_board:
What is TF-IDF? How is it useful for text classification? :woman:‍:mortar_board:
Which model would you use for text classification with bag of words features? :woman:‍:mortar_board:
Would you prefer gradient boosting trees model or logistic regression when doing text classification with bag of words? :woman:‍:mortar_board:
What are word embeddings? Why are they useful? Do you know Word2Vec? :woman:‍:mortar_board:
Do you know any other ways to get word embeddings? :man:‍:computer:
If you have a sentence with multiple words, you may need to combine multiple word embeddings into one. How would you do it? :woman:‍:mortar_board:
Would you prefer gradient boosting trees model or logistic regression when doing text classification with embeddings? :woman:‍:mortar_board:
How can you use neural nets for text classification? :man:‍:computer:
How can we use CNN for text classification? :man:‍:computer:

Clustering

What is unsupervised learning? :baby:
What is clustering? When do we need it? :baby:
Do you know how K-means works? :woman:‍:mortar_board:
How to select K for K-means? :woman:‍:mortar_board:
What are the other clustering algorithms do you know? :woman:‍:mortar_board:
Do you know how DBScan works? :woman:‍:mortar_board:
When would you choose K-means and when DBScan? :woman:‍:mortar_board:

Dimensionality reduction

What is the curse of dimensionality? Why do we care about it? :woman:‍:mortar_board:
Do you know any dimensionality reduction techniques? :woman:‍:mortar_board:
What’s singular value decomposition? How is it typically used for machine learning? :woman:‍:mortar_board:

Ranking and search

What is the ranking problem? Which models can you use to solve them? :woman:‍:mortar_board:
What are good unsupervised baselines for text information retrieval? :woman:‍:mortar_board:
How would you evaluate your ranking algorithms? Which offline metrics would you use? :woman:‍:mortar_board:
What is precision and recall at k? :woman:‍:mortar_board:
What is mean average precision at k? :woman:‍:mortar_board:
How can we use machine learning for search? :woman:‍:mortar_board:
How can we get training data for our ranking algorithms? :woman:‍:mortar_board:
Can we formulate the search problem as a classification problem? How? :woman:‍:mortar_board:
How can we use clicks data as the training data for ranking algorithms? :man:‍:computer:
Do you know how to use gradient boosting trees for ranking? :man:‍:computer:
How do you do an online evaluation of a new ranking algorithm? :woman:‍:mortar_board:

Recommender systems

What is a recommender system? :baby:
What are good baselines when building a recommender system? :woman:‍:mortar_board:
What is collaborative filtering? :woman:‍:mortar_board:
How we can incorporate implicit feedback (clicks, etc) into our recommender systems? :woman:‍:mortar_board:
What is the cold start problem? :woman:‍:mortar_board:
Possible approaches to solving the cold start problem? :woman:‍:mortar_board::man:‍:computer:

Time series

What is a time series? :baby:
How is time series different from the usual regression problem? :baby:
Which models do you know for solving time series problems? :woman:‍:mortar_board:
If there’s a trend in our series, how we can remove it? And why would we want to do it? :woman:‍:mortar_board:
You have a series with only one variable “y” measured at time t. How do predict “y” at time t+1? Which approaches would you use? :woman:‍:mortar_board:
You have a series with a variable “y” and a set of features. How do you predict “y” at t+1? Which approaches would you use? :woman:‍:mortar_board:
What are the problems with using trees for solving time series problems? :woman:‍:mortar_board:

That was a long list! I hope you found it useful. Good luck with your interviews!

The post is based on this thread on Twitter .

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

CSS设计指南

史密斯 / 李松峰 / 人民邮电出版社 / 2013-5 / 59.00元

《图灵程序设计丛书:CSS设计指南(第3版)》是一本面向初中级读者的经典设计指南。全书共分8章，前4章分别介绍了HTML标记和文档结构、CSS工作原理、定位元素、字体和文本，对规则、声明、层叠、特指度、选择符等基本概念进行了详细解读。随后4章介绍了页面布局、界面组件，CSS3圆角、阴影、渐变、多背景等视觉设计技巧，最后还对如何实现最前沿的响应式设计进行了通俗易懂的演示。一起来看看《CSS设计指南》这本书的介绍吧!

码农工具

Data Science Interview Questions

CSS设计指南

XML 在线格式化

html转js在线工具

RGB HSV 转换