深度学习实战 mnist数据集预处理技术分析

栏目: 编程工具 · 发布时间: 5年前

内容简介：本文首发于微信公众号："算法与编程之美"，欢迎关注，及时了解更多此系列文章。

欢迎点击「算法与编程之美」↑关注我们！

本文首发于微信公众号："算法与编程之美"，欢迎关注，及时了解更多此系列文章。

mnist数据集可以从 https://s3.amazonaws.com/img-datasets/mnist.npz 这个网址进行下载，下载的文件是一种称为npz格式的文件，这是numpy库生成的特有的压缩包格式。

numpy可以将numpy.array格式的数组以文件的形式进行序列化存储到文件，然后以反序列化的方式读取文件并直接还原成之前的数组。

存储的文件主要有两种形式：*.npy和*.npz。

npy的基本用法

import numpy as np

a = np.array([x for x in range(3)])

np.save('test-a', a) #文件的扩展名默认为.npy，因此完整文件名是test-a.npy

aa = np.load('test-a.npy')

print(aa) # [0 1 2]

npz的基本用法

当需要将多个数组保存在一个文件的时候，则需要用到npz文件格式存储。

import numpy as np

a = np.array([x for x in range(3)])

b = np.array([y for y in range(3,6)])

np.savez('test-ab.npz', a = a, b = b)

data = np.load('test-ab.npz')

print(data['a']) # [0 1 2]

print(data['b']) # [3 4 5]

了解npy和npz的基本用法之后，接下来将介绍keras中mnist的数据集加载过程。

from tensorflow import keras

import numpy as np

fname = 'mnist.npz'

path = keras.utils.get_file(fname=fname,

origin='https://s3.amazonaws.com/img-datasets/mnist.npz')

with np.load(path, allow_pickle=True) as f:

x_train, y_train = f['x_train'], f['y_train']

x_test, y_test = f['x_test'], ['y_test']

print(x_train.shape) # (60000, 28, 28)

print(x_test.shape) # (10000, 28, 28)

注：keras中下载的数据集默认的存放位置是：~/.keras/datasets/ 目录下。

可以看到mnist数据集的处理流程是将28x28x1的图片文件处理成四个numpy数组：x_train, y_train, x_test, y_test。然后将这四个数组写入到文件生成mnist.npz文件。

在使用数据集的时候，利用keras的get_file()先从指定的URL地址下载npz文件，然后加载得到两个tuple，下面是keras官方提供的mnist数据集load_data()方法：

def load_data(path='mnist.npz'):

"""Loads the MNIST dataset.

# Arguments

path: path where to cache the dataset locally

(relative to ~/.keras/datasets).

# Returns

Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`.

"""

path = get_file(path,

origin='https://s3.amazonaws.com/img-datasets/mnist.npz',

file_hash='8a61469f7ea1b51cbae51d4f78837e45')

with np.load(path, allow_pickle=True) as f:

x_train, y_train = f['x_train'], f['y_train']

x_test, y_test = f['x_test'], f['y_test']

return (x_train, y_train), (x_test, y_test)

END

主编 | 张祯悦

责编 | chen

where2 go 团队

微信号：算法与编程之美

长按识别二维码关注我们!

温馨提示：点击页面右下角 “写留言”发表评论，期待您的参与！期待您的转发！

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

轻公司

李黎、杜晨 / 中信出版社 / 2009-7 / 39.00元

《轻公司》解读了在互联网和IT技术越来越充裕的环境里，传统的商业法则是如何被打破，而新的商业法则如何建立起来的过程。大量生动翔实的采访，为我们构筑了互联网和IT技术影响下的未来商业趋势。李黎和杜晨在《IT经理世界》上发表了一篇封面报道《轻公司》后，迅速在传统行业及互联网行业产生极大反响，无论是老牌的传统企业、创业公司、风险投资商，都视这篇文章为新商业宝典，甚至有业界人士评价，这篇文章拯救了中国的电......一起来看看《轻公司》这本书的介绍吧!

码农工具

深度学习实战 mnist数据集预处理技术分析

轻公司

HTML 压缩/解压工具

图片转BASE64编码

Markdown 在线编辑器