A Guide to Build Your First Machine Learning Model and Start Your Data Science Career

栏目: IT技术 · 发布时间: 4年前

Step 1: Complete Kaggle’s course on Machine Learning

Screenshot of Kaggle’s Machine Learning Course

Personally, I found that Kaggle’s Intro to Machine Learning course was the best resource to getting started and the reason being is that it’s VERY basic — it literally provides you with the bare minimum to build your first machine learning model. And you might think that this is a bad thing but as a beginner, but there are a few reasons why this is good:

It’s much easier to understand how everything connects when you’re thrown with less information. Taking many small steps triumphs large strides.
Confidently building your first machine learning model will give you the motivation to dive deeper into your learnings. Similar to the first point, set many small attainable goals in addition to your big audacious goals.

As you go through this course, keep the following points in mind :

Focus on understanding the snippets of code. You’ll be reusing this code later on, so it’s in your best interest if you learn the rationale behind each piece of code. Don’t worry about memorizing the code — there’s nothing wrong with referring back to this as a resource.
Don’t stress over the fact that it only covers decision tree and random forest models. You’ll learn later on that you’re only required to change a couple of lines of code to change your machine learning model, so don’t fret.

Once you finish this, you can then move on to the second step to making your own machine learning model:

Step 2: Find a dataset on Kaggle and recreate your random forest model

Kaggle provides much more than online courses — it has thousands of datasets that you can use to explore with and create models with. Below are the steps required to complete your very own first model.

First, go to Kaggle’s list of datasets here and pick one that interests you. Think about what variable you would like to predict. Do you want to predict life expectancy? Real estate prices? Taxi usage? The world is your oyster.

Then click on ‘New Notebook’. This is where you’ll be replicating the code that you learned from Kaggle’s introductory course.

Once you’re in your new notebook, the rest is easy. Simply replicate the code that you were introduced to in Kaggle’s Intro to Machine Learning course. Then there’s only a couple of things that you need to change:

Change the csv file that .read_csv() is reading to the dataset that you chose.
Change the prediction variable, y , to the variable that you want to predict in the dataset that you chose.
Change the features ( the x variables ) that you’ll use to predict the y variables.

And that’s it! You’ve created your very own machine learning model with a dataset that you chose yourself. It may not seem like much right now, but a few more months into your data science journey and you’ll back at this and see how much progress you’ve made.

My First Machine Learning Model

For my first algorithm, I wanted to create something that I thought would be relevant later in my life. I decided to use a “Used Car Dataset” from Kaggle, which has over 600,000 used car listings. The algorithm I created aimed to predict the price of a used car based on a number of features, including the year it was built, the manufacturer , the odometer (number of kilometers), and more.

You can see how I coded my first model here and you can see that it’s not much, but seven weeks later, I was able to improve it significantly with many more steps (see my improved model here ).

Next Steps

There’s a million things that you can learn to improve your model — here are some articles that you can start with if you don’t know what to do next:

Thanks for the read!

If you like my work and want to support me, sign up on my email list here to be the first to hear about new and exclusive content!

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

A Guide to Build Your First Machine Learning Model and Start Your Data Science Career

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

连线力

杨国斌 / 邓燕华 / 广西师范大学出版社 / 2013-9 / 39.00

《连线力》，最关切我们未来的“思想@网络.中国”丛书之一，互联网中国传媒参考书。中国网民在行动。在中国的广大网民中，普遍存在着对正义的渴望和追求，对弱者和小人物的同情，对贪官污吏的痛恶，对政府的失望，对权贵的嘲讽，对沟通的渴望，甚至对革命的呼唤。这些因素有着共同的内在逻辑，即情感逻辑。在这个意义上，情感汹涌的网络事件，是整个中国社会情感结构的脉络。 1994年，中国开通了全功能的......一起来看看《连线力》这本书的介绍吧!

码农工具