[机器学习]机器学习笔记整理09- 基于SVM图像识别

内容简介：[机器学习]机器学习笔记整理09- 基于SVM图像识别

前言

前面介绍了SVM的基本概念和一般操作步骤,若如不理解请参考:

[机器学习]机器学习笔记整理08- SVM算法原理及实现下面来介绍一下,利用SVM进行图像识别.

图像识别

人脸识别是一项实用的技术。但是这种技术总是感觉非常神秘，在sklearn中看到了人脸识别的example，代码网址如下：

http://scikit-learn.org/0.13/auto_examples/applications/face_recognition.html#example-applications-face-recognition-py 首先介绍一些PCA和SVM的功能，PCA叫做主元分析，它可以从多元事物中解析出主要影响因素，揭示事物的本质，简化复杂的问题。计算主成分的目的是将高维数据投影到较低维空间。

PCA降维

PCA 主要用于数据降维，对于一系列例子的特征组成的多维向量，多维向量里的某些元素本身没有区分性，比如某个元素在所有的例子中都为1，或者与1差距不大，那么这个元素本身就没有区分性，用它做特征来区分，贡献会非常小。所以我们的目的是找那些变化大的元素，即方差大的那些维，而去除掉那些变化不大的维，从而使特征留下的都是精品，而且计算量也变小了。

SVM叫做支持向量机，之前的博客有所涉及有。SVM方法是通过一个非线性映射p，把样本空间映射到一个高维乃至无穷维的特征空间中，使得在原来的样本空间中非线性可分的问题转化为在特征空间中的线性可分的问题。

实验数据采集

再看看实验采用的数据集，数据集叫做Labeled Faces in the Wild。大约200M左右。整个有10000张图片，5700个人，1700人有两张或以上的照片。相关的网址： http://vis-www.cs.umass.edu/lfw/index.html

具体实现

1.导入模块

from __future__ import print_function

from time import time
import logging
import matplotlib.pyplot as plt

from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_lfw_people
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import RandomizedPCA
from sklearn.svm import SVC
# 显示进度和错误信息
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')

###############################################################################

lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# 转换为数组
n_samples, h, w = lfw_people.images.shape

# 对于机器学习，我们直接使用2个数据（由于该模型忽略了相对像素位置信息）
X = lfw_people.data
n_features = X.shape[1]

# 预测的标签是该人的身份
y = lfw_people.target
target_names = lfw_people.target_names
n_classes = target_names.shape[0]

print("Total dataset size:")
print("n_samples: %d" % n_samples)
print("n_features: %d" % n_features)
print("n_classes: %d" % n_classes)


###############################################################################
# 分为训练集和使用分层k折的测试集

# 分为培训和测试集
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25)


###############################################################################
# 在面部数据集上计算PCA（特征面）（被视为未标记的数据集）：无监督特征提取/维数降低
n_components = 150

print("Extracting the top %d eigenfaces from %d faces"
      % (n_components, X_train.shape[0]))
t0 = time()
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)
print("done in %0.3fs" % (time() - t0))

eigenfaces = pca.components_.reshape((n_components, h, w))

print("Projecting the input data on the eigenfaces orthonormal basis")
t0 = time()
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
print("done in %0.3fs" % (time() - t0))


###############################################################################
# 训练SVM分类模型

print("Fitting the classifier to the training set")
t0 = time()
param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],
              'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }
clf = GridSearchCV(SVC(kernel='rbf', class_weight='auto'), param_grid)
clf = clf.fit(X_train_pca, y_train)
print("done in %0.3fs" % (time() - t0))
print("Best estimator found by grid search:")
print(clf.best_estimator_)


###############################################################################
# 测试集上的模型质量的定量评估

print("Predicting people's names on the test set")
t0 = time()
y_pred = clf.predict(X_test_pca)
print("done in %0.3fs" % (time() - t0))

print(classification_report(y_test, y_pred, target_names=target_names))
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))


###############################################################################
# 使用matplotlib进行定性评估

def plot_gallery(images, titles, h, w, n_row=3, n_col=4):
    """Helper function to plot a gallery of portraits"""
    plt.figure(figsize=(1.8 * n_col, 2.4 * n_row))
    plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
    for i in range(n_row * n_col):
        plt.subplot(n_row, n_col, i + 1)
        plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
        plt.title(titles[i], size=12)
        plt.xticks(())
        plt.yticks(())


# 在测试集的一部分绘制预测结果

def title(y_pred, y_test, target_names, i):
    pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]
    true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]
    return 'predicted: %s\ntrue:      %s' % (pred_name, true_name)

prediction_titles = [title(y_pred, y_test, target_names, i)
                     for i in range(y_pred.shape[0])]

plot_gallery(X_test, prediction_titles, h, w)

# 绘制最有意义的特征面的画廊

eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces, eigenface_titles, h, w)

plt.show()

实验结果

[机器学习]机器学习笔记整理09- 基于SVM图像识别

以上所述就是小编给大家介绍的《[机器学习]机器学习笔记整理09- 基于SVM图像识别》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

数学之美

吴军 / 人民邮电出版社 / 2012-5-1 / 45.00元

几年前，“数学之美”系列文章原刊载于谷歌黑板报，获得上百万次点击，得到读者高度评价。读者说，读了“数学之美”，才发现大学时学的数学知识，比如马尔可夫链、矩阵计算，甚至余弦函数原来都如此亲切，并且栩栩如生，才发现自然语言和信息处理这么有趣。今年，作者吴军博士几乎把所有文章都重写了一遍，为的是把高深的数学原理讲得更加通俗易懂，让非专业读者也能领略数学的魅力。读者通过具体的例子学到的是思考问题的......一起来看看《数学之美》这本书的介绍吧!

码农工具