9 Things You Should Know about Scikit-Learn 0.23

栏目: IT技术 · 发布时间: 5年前

内容简介:Python 3.6 or newer is required to use scikit-learn 0.23. No more Python 3.5. It’s a good excuse to upgrade, if you need one.You can now visualize your pipeline in an interactive graphic, right inside your notebook. The data flow from the top to the bottom

9 Changes to Note

1. Python 3.6 or newer only :snake:

Python 3.6 or newer is required to use scikit-learn 0.23. No more Python 3.5. It’s a good excuse to upgrade, if you need one.

2. Interactive pipeline graphics ⬇

You can now visualize your pipeline in an interactive graphic, right inside your notebook. The data flow from the top to the bottom. Technically, you’re visualizing a DAG (Directed Acyclic Graph). Here’s an example:

9 Things You Should Know about Scikit-Learn 0.23

You can see the code that accompanies this article on GitHub here .

Just add the following code, make and fit a pipeline, and the graph appears! :grinning:

sklearn.set_config(display=”diagram”)

Pipelines are a great feature of scikit-learn! Pipelines and ColumnTransformers are powerful, but can be a tricky for newcomers to grasp. These diagrams can help folks learn and understand what’s happening faster. :clap:

A ColumnTransformer object allows different transformations to be applied to different features. I suggest you create them with the make_column_transformer convenience method. Note that if you pass columns through unaltered with the passthrough argument, that doesn’t show up on the DAG. :point_up:

3. Poisson and gamma GLMs have arrived :tada:

The Poisson and Gamma generalized linear models can be imported with linear_model.PoissonRegressor and linear_model.GammaRegressor , respectively. Now you shouldn’t need to leave scikit-learn for scipy.stats or statsmodels if you need this functionality.

Poisson regression is often appropriate for count data and gamma regression is often appropriate when predicting the time between two Poisson events. If you are are looking for more information on when to use a gamma GLM see this Cross Validated post .

4. fit() doesn’t show you everything :no_entry_sign:

The fit() method will not show all the attributes of an estimator when you call it. Only the arguments that you changed are shown.

To show all the attributes, as in earlier versions, run this code:

sklearn.set_config(print_changed_only=False)

Alternatively, just call the get_params() method on the estimator to see the parameters. :grinning:

5. n_features_in_ shows you how many features :1234:

Most estimators now expose the n_features_in_ attribute to display how many features were passed to the fit() method.

Note that in a pipeline with OneHotEncoder n_features_in will show you how many features go in, not how many are fit to the final model. :point_up:

6. Easier sample dataset loading

Most sample datasets can be loaded into a pandas DataFrame more easily. Just pass the argument as_frame=True . Then the .data attribute is a DataFrame. For example, here’s how you can load the diabetes dataset:

diabetes = load_diabetes(as_frame=True)
df_diabetes = diabetes.data

Loading datasets from scikit-learn used to be a bit of a pain. It’s easier now, but still not as easy as seaborn. Note that load_boston() returns the Boston Housing dataset but doesn’t implement as_frame yet. :point_up:

9 Things You Should Know about Scikit-Learn 0.23

Boston. Source: pixababay.com

7. Avoid type hinting errors :warning:

Scikit-learn now works with mypy without erring. If you’re using type hinting, this is nice. :grinning:

8. Improvements to experimental classes

HistGradientBoostingRegressor and HistGradientBoostingClassifier , the two LightGBM -inspired tree ensemble algorithms, are still experimental. They still need to be specially imported. However, they received a number of improvements. Same with IterativeImputer — it’s still experimental and has been improved.

9. Plays nicer with new pandas dtype :panda_face:

Speaking of scikit-learn imputers, they now accept the pandas nullable integer dtype with missing values — see my article on what’s new in Pandas 1.0 to learn about those. The continued friction reduction between pandas and scikit-learn is music to my ears.

9 Things You Should Know about Scikit-Learn 0.23

Music. Source: pixabay.com

The full release notes for version 0.23.0 are available here . The docs are the current stable docs as of this writing and they are available here .

Wrap

You’ve seen the 9 most important changes in scikit-learn version 0.23.0. Now you can impress your friends and colleagues with your knowledge. :wink:

I hope you found this guide to be helpful. If you did, please share it on your favorite social media so other folks can find it, too. :grinning:

I write about Python , SQL , Docker , data science, and other tech topics. If any of that’s of interest to you, follow me and read more here . If you want to stay up on the latest data science tips and tools, subscribe to my Data Awesome mailing list .

9 Things You Should Know about Scikit-Learn 0.23

Happy sklearning! :rocket:


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

世界因你不同

世界因你不同

李开复、范海涛 / 中信出版社 / 2009-9 / 29.80元

这是李开复唯一的一本自传,字里行间,是岁月流逝中沉淀下来的宝贵的人生智慧和职场经验。捣蛋的“小皇帝”,11岁的“留学生”,奥巴马的大学同学,26岁的副教授,33岁的苹果副总裁,谷歌中国的创始人,他有着太多传奇的经历,为了他,两家最大的IT公司对簿公堂。而他的每一次人生选择,都是一次成功的自我超越。 透过这本自传,李开复真诚讲述了他鲜为人知的成长史、风雨兼程的成功史和烛照人生的心灵史,也首次全......一起来看看 《世界因你不同》 这本书的介绍吧!

URL 编码/解码
URL 编码/解码

URL 编码/解码

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具