Do This Additional Step, You Have Made A Generalize Machine Learning Model

栏目: IT技术 · 发布时间: 5年前

Do This Additional Step, You Have Made A Generalize Machine Learning Model

Implementation

The first step is to split the data to train and test data. The train data will be used for cross-validation and the test data will be used as the unseen data. Then, after we split the data, we can do cross validation on the training data and you can adjust how much the amount of k you want to use. And finally, we can make a prediction on the unseen data and we can see the score how well the model is. The code for doing these is look like this,

from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import classification_report, confusion_matrix
# Split the X and Y
X = df_preprocessed.drop(default, axis = 1).values
y = df_preprocessed[default].values
# Split the dataset for cross validation and unseen data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)
# Doing Hyperparameter Tuning and Cross Validation without the
# unseen data using Decision Tree. Then, fit the model on it

param_grid = {
'max_depth': [i for i in range(3, 10, 2)]
}
dt = DecisionTreeClassifier(random_state=42)
clf = GridSearchCV(dt, param_grid, cv=5)
clf.fit(X_train, y_train)
# Predict the unseen data and print the score
y_pred = clf.predict(X_test)
clf.score(X_test, y_test)
classification_report(y_test, y_pred)

Based on the implementations above, I’ve got an accuracy about 88.3% on the test data. It means that the model has a good score and capable for handling the unseen data. Also, when we create the classification report using classification_report function, the result looks like this,

The main label that we want to predict, which is 1, has a really good score with 92% rate of precision and 71% rate of recall. This model still can be improved by tuning the hyper parameters and also doing some feature selection and engineering. If you want to see how my work on this, you can see my GitHub here .


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

引爆点

引爆点

【加】马尔科姆•格拉德威尔(Malcolm Gladwell) / 钱清、覃爱冬 / 中信出版社 / 2014-4 / 36.00元

《引爆点》是《纽约客》怪才格拉德威尔的一部才华横溢之作。他以社会上突如其来的流行潮为切入点,从全新角度探索了控制科学和营销模式。他认为,思想、行为、信息及产品常会像传染病暴发一样迅速传播。正如一个病人就能引起全城流感;几位涂鸦爱好者能在地铁掀起犯罪浪潮;一位满意而归的顾客还能让新开张的餐馆座无虚席;发起小规模流行的团队能引发大规模流行风暴。这些现象均属“社会流行潮”,它达到临界水平并爆发的那一刻,......一起来看看 《引爆点》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试