2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)

栏目: Python · 发布时间: 5年前

内容简介:上一篇实现了图片CNN单标签分类(猫狗图片分类任务)地址:预告:下一篇用LSTM+CTC实现不定长文本的OCR,本质上是一种不固定标签个数的多标签分类问题

上一篇实现了图片CNN单标签分类(猫狗图片分类任务)

地址: juejin.im/post/5c0739…

预告:下一篇用LSTM+CTC实现不定长文本的OCR,本质上是一种不固定标签个数的多标签分类问题

本文所用到的10w验证码数据集百度网盘下载地址(也可使用下文代码自行生成):

pan.baidu.com/s/1N7bDHxIM…

利用本文代码训练并生成的模型(对应项目中的model文件夹):

pan.baidu.com/s/1GyEpLdM5…

项目简介:

(需要预先安装pip install captcha==0.1.1,pip install opencv-python,pip install flask, pip install tensorflow/pip install tensorflow-gpu) 本文采用CNN实现4位定长验证码图片OCR(生成的验证码固定由随机的4位大写字母组成),本质上是一张图片多个标签的分类问题(数据如下图所示)

2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)

整体训练逻辑:

1,将图像传入到CNN中提取特征

2,将特征图拉伸输入到FC layer中得出分类预测向量

3,通过sigmoid交叉熵函数对预测向量和标签向量进行训练,得出最终模型(注意:多标签分类任务采用sigmoid,单标签分类采用softmax)

整体预测逻辑:

1,将图像传入到CNN(VGG16)中提取特征

2,将特征图拉伸输入到FC layer中得出分类预测向量

3,将预测向量做sigmoid操作,由于验证码固定是4位,所以将向量切分成4条,从每条中找到最大值,并映射到对应的字母上

制作成web服务:

利用flask框架将整个项目启动成web服务,使得项目支持http方式调用 启动服务后调用以下地址测试

http://127.0.0.1:5050/captchaOcr?img_path=./dataset/test/0_HZDZ.png

http://127.0.0.1:5050/captchaOcr?img_path=./dataset/test/1_CKAN.png

后续优化逻辑:

提取特征部分的CNN可以用RNN取代

本方案只能OCR固定长度文本,后续采用LSTM+CTC的方式来OCR非定长文本

运行命令:

自行生成验证码训练寄(本文生成了10w张,修改self.im_total_num变量): pythonCnnOcr.py create_dataset

对数据集进行训练:pythonCnnOcr.py train

对新的图片进行测试:pythonCnnOcr.py test

启动成http服务:pythonCnnOcr.py start

项目目录结构:

2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)

训练过程:

2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)
2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)
2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)
2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)
2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)
2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)
2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)

整体代码如下:

# coding:utf-8

from captcha.image import ImageCaptcha
import numpy as np
import cv2
import tensorflow as tf
import random, os, sys

from flask import request
from flask import Flask
import json
app = Flask(__name__)


class CnnOcr:
    def __init__(self):
        self.epoch_max = 6  # 最大迭代epoch次数
        self.batch_size = 64  # 训练时每个批次参与训练的图像数目,显存不足的可以调小
        self.lr = 1e-3  # 初始学习率
        self.save_epoch = 1  # 每相隔多少个epoch保存一次模型


        self.im_width = 128
        self.im_height = 64
        self.im_total_num = 100000  # 总共生成的验证码图片数量
        self.train_max_num = self.im_total_num  # 训练时读取的最大图片数目
        self.val_num = 50 * self.batch_size  # 不能大于self.train_max_num  做验证集用
        self.words_num = 4  # 每张验证码图片上的数字个数
        self.words = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
        self.label_num = self.words_num * len(self.words)
        self.keep_drop = tf.placeholder(tf.float32)
        self.x = None
        self.y = None


    def captchaOcr(self, img_path):
        """
        验证码识别
        :param img_path:
        :return:
        """
        im = cv2.imread(img_path)
        im = cv2.resize(im, (self.im_width, self.im_height))
        im = [im]
        im = np.array(im, dtype=np.float32)
        im -= 147
        output = self.sess.run(self.max_idx_p, feed_dict={self.x: im, self.keep_drop: 1.})
        ret = ''
        for i in output.tolist()[0]:
            ret = ret + self.words[int(i)]
        return ret


    def test(self, img_path):
        """
        测试接口
        :param img_path:
        :return:
        """
        self.x = tf.placeholder(tf.float32, [None, self.im_height, self.im_width, 3])  # 输入数据
        self.pred = self.cnnNet()
        self.output = tf.nn.sigmoid(self.pred)
        self.predict = tf.reshape(self.pred, [-1, self.words_num, len(self.words)])
        self.max_idx_p = tf.argmax(self.predict, 2)

        saver = tf.train.Saver()
        # tfconfig = tf.ConfigProto(allow_soft_placement=True)
        # tfconfig.gpu_options.per_process_gpu_memory_fraction = 0.3  # 占用显存的比例
        # self.ses = tf.Session(config=tfconfig)
        self.sess = tf.Session()
        self.sess.run(tf.global_variables_initializer())  # 全局tf变量初始化

        # 加载w,b参数
        saver.restore(self.sess, './model/CnnOcr-6')
        im = cv2.imread(img_path)
        im = cv2.resize(im, (self.im_width, self.im_height))
        im = [im]
        im = np.array(im, dtype=np.float32)
        im -= 147
        output = self.sess.run(self.max_idx_p, feed_dict={self.x: im, self.keep_drop: 1.})
        ret = ''
        for i in output.tolist()[0]:
            ret = ret + self.words[int(i)]
        print(ret)


    def train(self):
        x_train_list, y_train_list, x_val_list, y_val_list = self.getTrainDataset()

        print('开始转换tensor队列')
        x_train_list_tensor = tf.convert_to_tensor(x_train_list, dtype=tf.string)
        y_train_list_tensor = tf.convert_to_tensor(y_train_list, dtype=tf.float32)

        x_val_list_tensor = tf.convert_to_tensor(x_val_list, dtype=tf.string)
        y_val_list_tensor = tf.convert_to_tensor(y_val_list, dtype=tf.float32)

        x_train_queue = tf.train.slice_input_producer(tensor_list=[x_train_list_tensor], shuffle=False)
        y_train_queue = tf.train.slice_input_producer(tensor_list=[y_train_list_tensor], shuffle=False)

        x_val_queue = tf.train.slice_input_producer(tensor_list=[x_val_list_tensor], shuffle=False)
        y_val_queue = tf.train.slice_input_producer(tensor_list=[y_val_list_tensor], shuffle=False)

        train_im, train_label = self.dataset_opt(x_train_queue, y_train_queue)
        train_batch = tf.train.batch(tensors=[train_im, train_label], batch_size=self.batch_size, num_threads=2)

        val_im, val_label = self.dataset_opt(x_val_queue, y_val_queue)
        val_batch = tf.train.batch(tensors=[val_im, val_label], batch_size=self.batch_size, num_threads=2)

        print('开启训练')
        self.learning_rate = tf.placeholder(dtype=tf.float32)  # 动态学习率
        self.x = tf.placeholder(tf.float32, [None, self.im_height, self.im_width, 3])  # 训练数据
        self.y = tf.placeholder(tf.float32, [None, self.label_num])  # 标签
        self.pred = self.cnnNet()
        self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=self.pred, labels=self.y))
        self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)

        self.predict = tf.reshape(self.pred, [-1, self.words_num, len(self.words)])
        self.max_idx_p = tf.argmax(self.predict, 2)

        self.y_predict = tf.reshape(self.y, [-1, self.words_num, len(self.words)])
        self.max_idx_l = tf.argmax(self.y_predict, 2)

        self.correct_pred = tf.equal(self.max_idx_p, self.max_idx_l)
        self.accuracy = tf.reduce_mean(tf.cast(self.correct_pred, tf.float32))

        with tf.Session() as self.sess:
            # 全局tf变量初始化
            self.sess.run(tf.global_variables_initializer())
            coordinator = tf.train.Coordinator()
            threads = tf.train.start_queue_runners(sess=self.sess, coord=coordinator)

            # 模型保存
            saver = tf.train.Saver()

            batch_max = len(x_train_list) // self.batch_size
            total_step = 1
            for epoch_num in range(self.epoch_max):
                lr = self.lr * (1 - (epoch_num/self.epoch_max) ** 2)  # 动态学习率
                for batch_num in range(batch_max):
                    x_train_tmp, y_train_tmp = self.sess.run(train_batch)
                    # print(x_train_tmp.shape, y_train_tmp.shape)
                    # sys.exit()

                    self.sess.run(self.optimizer, feed_dict={self.x: x_train_tmp, self.y: y_train_tmp, self.learning_rate: lr, self.keep_drop: .5})

                    # 输出评价标准
                    if total_step % 50 == 0 or total_step == 1:
                        print()
                        print('epoch:%d/%d batch:%d/%d step:%d lr:%.10f' % ((epoch_num + 1), self.epoch_max, (batch_num + 1), batch_max, total_step, lr))

                        # 输出训练集评价
                        train_loss, train_acc = self.sess.run([self.loss, self.accuracy], feed_dict={self.x: x_train_tmp, self.y: y_train_tmp, self.keep_drop: 1.})
                        print('train_loss:%.10f  train_acc:%.10f' % (np.mean(train_loss), train_acc))

                        # 输出验证集评价
                        val_loss_list, val_acc_list = [], []
                        for i in range(int(self.val_num/self.batch_size)):
                            x_val_tmp, y_val_tmp = self.sess.run(val_batch)
                            val_loss, val_acc = self.sess.run([self.loss, self.accuracy], feed_dict={self.x: x_val_tmp, self.y: y_val_tmp, self.keep_drop: 1.})
                            val_loss_list.append(np.mean(val_loss))
                            val_acc_list.append(np.mean(val_acc))
                        print('  val_loss:%.10f    val_acc:%.10f' % (np.mean(val_loss), np.mean(val_acc)))

                    total_step += 1

                # 保存模型
                if (epoch_num + 1) % self.save_epoch == 0:
                    print('正在保存模型:')
                    saver.save(self.sess, './model/CnnOcr', global_step=(epoch_num + 1))
            coordinator.request_stop()
            coordinator.join(threads)



    def cnnNet(self):
        """
        cnn网络
        :return:
        """
        weight = {
            # 输入 128*64*3

            # 第一层
            'wc1_1': tf.get_variable('wc1_1', [5, 5, 3, 32]),  # 卷积 输出:128*64*32
            'wc1_2': tf.get_variable('wc1_2', [5, 5, 32, 32]),  # 卷积 输出:128*64*32
            # 池化 输出:64*32*32

            # 第二层
            'wc2_1': tf.get_variable('wc2_1', [5, 5, 32, 64]),  # 卷积 输出:64*32*64
            'wc2_2': tf.get_variable('wc2_2', [5, 5, 64, 64]),  # 卷积 输出:64*32*64
            # 池化 输出:32*16*64

            # 第三层
            'wc3_1': tf.get_variable('wc3_1', [3, 3, 64, 64]),  # 卷积 输出:32*16*256
            'wc3_2': tf.get_variable('wc3_2', [3, 3, 64, 64]),  # 卷积 输出:32*16*256
            # 池化 输出:16*8*256

            # 第四层
            'wc4_1': tf.get_variable('wc4_1', [3, 3, 64, 64]),  # 卷积 输出:16*8*64
            'wc4_2': tf.get_variable('wc4_2', [3, 3, 64, 64]),  # 卷积 输出:16*8*64
            # 池化 输出:8*4*64

            # 全链接第一层
            'wfc_1': tf.get_variable('wfc_1', [8*4*64, 2048]),

            # 全链接第二层
            'wfc_2': tf.get_variable('wfc_2', [2048, 2048]),

            # 全链接第三层
            'wfc_3': tf.get_variable('wfc_3', [2048, self.label_num]),
        }

        biase = {
            # 第一层
            'bc1_1': tf.get_variable('bc1_1', [32]),
            'bc1_2': tf.get_variable('bc1_2', [32]),

            # 第二层
            'bc2_1': tf.get_variable('bc2_1', [64]),
            'bc2_2': tf.get_variable('bc2_2', [64]),

            # 第三层
            'bc3_1': tf.get_variable('bc3_1', [64]),
            'bc3_2': tf.get_variable('bc3_2', [64]),

            # 第四层
            'bc4_1': tf.get_variable('bc4_1', [64]),
            'bc4_2': tf.get_variable('bc4_2', [64]),

            # 全链接第一层
            'bfc_1': tf.get_variable('bfc_1', [2048]),

            # 全链接第二层
            'bfc_2': tf.get_variable('bfc_2', [2048]),

            # 全链接第三层
            'bfc_3': tf.get_variable('bfc_3', [self.label_num]),
        }

        # 第一层
        net = tf.nn.conv2d(self.x, weight['wc1_1'], [1, 1, 1, 1], 'SAME')  # 卷积
        net = tf.nn.bias_add(net, biase['bc1_1'])
        net = tf.nn.relu(net)  # 加b 然后 激活
        print('conv1', net)
        net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')  # 池化
        print('pool1', net)

        # 第二层
        net = tf.nn.conv2d(net, weight['wc2_1'], [1, 1, 1, 1], padding='SAME')  # 卷积
        net = tf.nn.bias_add(net, biase['bc2_1'])
        net = tf.nn.relu(net)  # 加b 然后 激活
        print('conv2', net)
        net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')  # 池化
        print('pool2', net)

        # 第三层
        net = tf.nn.conv2d(net, weight['wc3_1'], [1, 1, 1, 1], padding='SAME')  # 卷积
        net = tf.nn.bias_add(net, biase['bc3_1'])
        net = tf.nn.relu(net)  # 加b 然后 激活
        print('conv3', net)
        net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')  # 池化
        print('pool3', net)

        # 第四层
        net = tf.nn.conv2d(net, weight['wc4_1'], [1, 1, 1, 1], padding='SAME')  # 卷积
        net = tf.nn.bias_add(net, biase['bc4_1'])
        net = tf.nn.relu(net)  # 加b 然后 激活
        print('conv4', net)
        net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')  # 池化
        print('pool4', net)

        # 拉伸flatten,把多个图片同时分别拉伸成一条向量
        net = tf.reshape(net, shape=[-1, weight['wfc_1'].get_shape()[0]])
        print('拉伸flatten', net)

        # 全链接层
        # fc第一层
        net = tf.matmul(net, weight['wfc_1']) + biase['bfc_1']
        net = tf.nn.dropout(net, self.keep_drop)
        net = tf.nn.relu(net)

        print('fc第一层', net)
        # fc第二层
        net = tf.matmul(net, weight['wfc_2']) + biase['bfc_2']
        net = tf.nn.dropout(net, self.keep_drop)
        net = tf.nn.relu(net)

        print('fc第二层', net)
        # fc第三层
        net = tf.matmul(net, weight['wfc_3']) + biase['bfc_3']
        print('fc第三层', net)
        return net


    def getTrainDataset(self):
        """
        整理数据集,把图像resize为128*64*3,训练集做成self.im_total_num*128*64*3,把label做成0,1向量形式
        :return:
        """
        train_data_list = os.listdir('./dataset/train/')
        print('共有%d张训练图片, 读取%d张:' % (len(train_data_list), self.train_max_num))
        random.shuffle(train_data_list)  # 打乱顺序

        y_val_list, y_train_list = [], []
        x_val_list = train_data_list[:self.val_num]
        for x_val in x_val_list:
            words_tmp = x_val.split('.')[0].split('_')[1]
            y_val_list.append([1 if _w == w else 0 for w in words_tmp for _w in self.words])

        x_train_list = train_data_list[self.val_num:self.train_max_num]
        for x_train in x_train_list:
            words_tmp = x_train.split('.')[0].split('_')[1]
            y_train_list.append([1 if _w == w else 0 for w in words_tmp for _w in self.words])

        return x_train_list, y_train_list, x_val_list, y_val_list


    def createCaptchaDataset(self):
        """
        生成训练用图片数据集
        :return:
        """
        image = ImageCaptcha(width=self.im_width, height=self.im_height, font_sizes=(56,))
        for i in range(self.im_total_num):
            words_tmp = ''
            for j in range(self.words_num):
                words_tmp = words_tmp + random.choice(self.words)
            print(words_tmp, type(words_tmp))
            im_path = './dataset/train/%d_%s.png' % (i, words_tmp)
            print(im_path)
            image.write(words_tmp, im_path)
        return True


    def dataset_opt(self, x_train_queue, y_train_queue):
        """
        处理图片和标签
        :param queue:
        :return:
        """
        queue = x_train_queue[0]
        contents = tf.read_file('./dataset/train/' + queue)
        im = tf.image.decode_jpeg(contents)
        im = tf.image.resize_images(images=im, size=[self.im_height, self.im_width])
        im = tf.reshape(im, tf.stack([self.im_height, self.im_width, 3]))
        im -= 147  # 去均值化
        # im /= 255  # 将像素处理在0~1之间,加速收敛
        # im -= 0.5  # 将像素处理在-0.5~0.5之间
        return im, y_train_queue[0]




if __name__ == '__main__':
    opt_type = sys.argv[1:][0]

    instance = CnnOcr()

    if opt_type == 'create_dataset':
        instance.createCaptchaDataset()
    elif opt_type == 'train':
        instance.train()
    elif opt_type == 'test':
        instance.test('./dataset/test/0_HZDZ.png')
    elif opt_type == 'start':
        # 将session持久化到内存中
        instance.test('./dataset/test/0_HZDZ.png')

        # 启动web服务
        # http://127.0.0.1:5050/captchaOcr?img_path=./dataset/test/2_SYVD.png
        @app.route('/captchaOcr', methods=['GET'])
        def captchaOcr():
            img_path = request.args.to_dict().get('img_path')
            print(img_path)
            ret = instance.captchaOcr(img_path)
            print(ret)
            return json.dumps({'img_path': img_path, 'ocr_ret': ret})

        app.run(host='0.0.0.0', port=5050, debug=False)
复制代码

以上所述就是小编给大家介绍的《2.CNN图片多标签分类(基于TensorFlow实现验证码识别OCR)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

The Art and Science of Java

The Art and Science of Java

Eric Roberts / Addison-Wesley / 2007-3-1 / USD 121.60

In The Art and Science of Java, Stanford professor and well-known leader in CS Education Eric Roberts emphasizes the student-friendly exposition that led to the success of The Art and Science of C. By......一起来看看 《The Art and Science of Java》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具