Step 1: Complete Kaggle’s course on Machine Learning
Personally, I found that Kaggle’s Intro to Machine Learning course was the best resource to getting started and the reason being is that it’s VERY basic — it literally provides you with the bare minimum to build your first machine learning model. And you might think that this is a bad thing but as a beginner, but there are a few reasons why this is good:
- It’s much easier to understand how everything connects when you’re thrown with less information. Taking many small steps triumphs large strides.
- Confidently building your first machine learning model will give you the motivation to dive deeper into your learnings. Similar to the first point, set many small attainable goals in addition to your big audacious goals.
As you go through this course, keep the following points in mind :
- Focus on understanding the snippets of code. You’ll be reusing this code later on, so it’s in your best interest if you learn the rationale behind each piece of code. Don’t worry about memorizing the code — there’s nothing wrong with referring back to this as a resource.
- Don’t stress over the fact that it only covers decision tree and random forest models. You’ll learn later on that you’re only required to change a couple of lines of code to change your machine learning model, so don’t fret.
Once you finish this, you can then move on to the second step to making your own machine learning model:
Step 2: Find a dataset on Kaggle and recreate your random forest model
Kaggle provides much more than online courses — it has thousands of datasets that you can use to explore with and create models with. Below are the steps required to complete your very own first model.
First, go to Kaggle’s list of datasets here and pick one that interests you. Think about what variable you would like to predict. Do you want to predict life expectancy? Real estate prices? Taxi usage? The world is your oyster.
Then click on ‘New Notebook’. This is where you’ll be replicating the code that you learned from Kaggle’s introductory course.
Once you’re in your new notebook, the rest is easy. Simply replicate the code that you were introduced to in Kaggle’s Intro to Machine Learning course. Then there’s only a couple of things that you need to change:
- Change the csv file that .read_csv() is reading to the dataset that you chose.
- Change the prediction variable, y , to the variable that you want to predict in the dataset that you chose.
- Change the features ( the x variables ) that you’ll use to predict the y variables.
And that’s it! You’ve created your very own machine learning model with a dataset that you chose yourself. It may not seem like much right now, but a few more months into your data science journey and you’ll back at this and see how much progress you’ve made.
My First Machine Learning Model
For my first algorithm, I wanted to create something that I thought would be relevant later in my life. I decided to use a “Used Car Dataset” from Kaggle, which has over 600,000 used car listings. The algorithm I created aimed to predict the price of a used car based on a number of features, including the year it was built, the manufacturer , the odometer (number of kilometers), and more.
You can see how I coded my first model here and you can see that it’s not much, but seven weeks later, I was able to improve it significantly with many more steps (see my improved model here ).
Next Steps
There’s a million things that you can learn to improve your model — here are some articles that you can start with if you don’t know what to do next:
Thanks for the read!
If you like my work and want to support me, sign up on my email list here to be the first to hear about new and exclusive content!
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。