一份 Python 机器学习在线指南

栏目: Python · 发布时间: 5年前

内容简介：机器学习、深度学习如何更好、更有效率地学习？不外乎两点：一是需要一份较完备的入门路线；二是教程与代码相结合提升学习效率。今天给大家推荐一份绝佳的机器学习入门完备资料，内容丰富、结合了教程与代码，而且通俗易懂，非常适合快速学习！首先附上这份指南的 GitHub 地址：https://github.com/machinelearningmindset/machine-learning-course

机器学习、深度学习如何更好、更有效率地学习？不外乎两点：一是需要一份较完备的入门路线；二是教程与代码相结合提升学习效率。今天给大家推荐一份绝佳的机器学习入门完备资料，内容丰富、结合了教程与代码，而且通俗易懂，非常适合快速学习！

首先附上这份指南的 GitHub 地址：

https://github.com/machinelearningmindset/machine-learning-course

简介

这份指南主要是提供一个全面而简单的使用 Python 的机器学习课程。机器学习作为人工智能的工具，是应用最广泛的科学领域之一。大量关于机器学习的文献已经发表。这个项目的目的是通过展示一系列使用 Python 的简单而全面的教程来提供机器学习的最重要方面。在这个项目中，我们使用许多不同的众所周知的机器学习框架（如 scikit-learn）构建了我们的教程。在这个项目中，你将学到：

机器学习的定义是什么？
什么时候开始的，什么是趋势演变？
什么是机器学习类别和子类别？
什么是最常用的机器学习算法以及如何实现它们？

这份指南的目录如下：

机器学习基础
- 线性回归
- 过拟合/欠拟合
- 正则化
- 交叉验证
监督式学习
- 决策树
- kNN
- 朴素贝叶斯
- 逻辑回归
- 支持向量机
非监督式学习
- 聚类
- 主成分分析 PCA
深度学习
- 神经网络概述
- 卷积神经网络
- 自编码器
- 循环神经网络

下面我们来具体看一下这份指南！

1. 机器学习基础

一份 Python 机器学习在线指南

这部分主要包含了一些机器学习的基础知识，包括线性回归、过拟合/欠拟合、正则化和交叉验证。

一份 Python 机器学习在线指南

每个知识点不仅包含了理论解释，也有全面的代码讲解。例如线性回归部分：

一份 Python 机器学习在线指南

线性回归的示例代码：

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets, linear_model
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Create a data set for analysis
x, y = make_regression(n_samples=500, n_features = 1, noise=25, random_state=0)

# Split the data set into testing and training data
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=0)

# Plot the data
sns.set_style("darkgrid")
sns.regplot(x_test, y_test, fit_reg=False)

# Remove ticks from the plot
plt.xticks([])
plt.yticks([])

plt.tight_layout()
plt.show()

2. 监督式学习

一份 Python 机器学习在线指南

这部分主要包含了一些机器学习中的监督式学习，包括决策树、kNN、朴素贝叶斯、逻辑回归、支持向量机。

一份 Python 机器学习在线指南

例如支持向量机部分的详细介绍：

一份 Python 机器学习在线指南

支持向量机的示例代码：

# All the libraries we need for linear SVM
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
# This is used for our dataset
from sklearn.datasets import load_breast_cancer


# =============================================================================
# We are using sklearn datasets to create the set of data points about breast cancer
# Data is the set data points
# target is the classification of those data points. 
# More information can be found athttps://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer
# =============================================================================
dataCancer = load_breast_cancer()

# The data[:, x:n] gets two features for the data given. 
# The : part gets all the rows in the matrix. And 0:2 gets the first 2 columns 
# If you want to get a different two features you can replace 0:2 with 1:3, 2:4,... 28:30, 
# there are 30 features in the set so it can only go up to 30.
# If we wanted to plot a 3 dimensional plot then the difference between x and n needs to be 3 instead of two
data = dataCancer.data[:, 0:2]
target = dataCancer.target

# =============================================================================
# Creates the linear svm model and fits it to our data points
# The optional parameter will be default other than these two,
# You can find the other parameters at https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
# =============================================================================
model = svm.SVC(kernel = 'linear', C = 10000)
model.fit(data, target)


# plots the points 
plt.scatter(data[:, 0], data[:, 1], c=target, s=30, cmap=plt.cm.prism)

# Creates the axis bounds for the grid
axis = plt.gca()
x_limit = axis.get_xlim()
y_limit = axis.get_ylim()

# Creates a grid to evaluate model
x = np.linspace(x_limit[0], x_limit[1], 50)
y = np.linspace(y_limit[0], y_limit[1], 50)
X, Y = np.meshgrid(x, y)
xy = np.c_[X.ravel(), Y.ravel()]

# Creates the decision line for the data points, use model.predict if you are classifying more than two 
decision_line = model.decision_function(xy).reshape(Y.shape)


# Plot the decision line and the margins
axis.contour(X, Y, decision_line, colors = 'k', levels=[0], 
linestyles=['-'])
# Shows the support vectors that determine the desision line
axis.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=100,
linewidth=1, facecolors='none', edgecolors='k')

# Shows the graph
plt.show()