内容简介:GAN主要分为两部分:(GAN目标是训练一个生成模型完美的拟合真实数据分布使得判别模型无法区分。)(1)生成模型:模拟真实数据的分布。(2)判别模型:判断一个样本是真实的样本还是生成的样本。
本文着重介绍SeqGAN原理 和 SeqGAN代码详解:
SeqGAN原理部分:
首先介绍GAN:
GAN主要分为两部分:(GAN目标是训练一个生成模型完美的拟合真实数据分布使得判别模型无法区分。)
(1)生成模型:模拟真实数据的分布。
(2)判别模型:判断一个样本是真实的样本还是生成的样本。
其次介绍SeqGAN:
GAN在图像领域的应用较多, 在文本方面效果不佳的原因主要是GAN 在生成连续离散序列时会遇到两个问题:一是因为生成器的输出是离散的,梯度更新从判别器传到生成器比较困难;二是判别器只有当序列被完全生成后才能进行判断,但此刻指导用处已不太大,而如果生成器生成序列的同时判别器来判断,如何平衡当前序列的分数和未来序列的分数又是一个难题。
在这篇论文中,作者提出了一个序列生成模型——SeqGAN ,来解决上述这两个问题。作者将生成器看作是强化学习中的stochastic policy,这样SeqGAN 就可以直接通过gradient policy update 避免生成器中的可导问题。同时,判别器对整个序列的评分作为强化学习的奖励信号可以通过Monte Carlo 搜索传递到序列生成的中间时刻。
具体来说,作者将生成器生成序列的过程看做是一个强化学习中的序列决策过程。生成模型被看作一个agent,目前为止已生成的序列表示当前state,下一个要生成的单词则是采取的action,判别模型对序列的评价分数则是返回的reward。
模型结构如下所示:
如上图,左边是判别器的训练,通过输入来自真实数据的正样例和来自生成器生成的负样例从而训练,判别器由 CNN 组成;右边是生成器的训练,通过将判别器判别的概率回传给生成器从而训练,这里使用了 Monte Carlo search 和 policy gradient 方法。
代码详细解释:
sequence_gan.py
import numpy as np import tensorflow as tf import random from dataloader import Gen_Data_loader, Dis_dataloader from generator import Generator from discriminator import Discriminator from rollout import ROLLOUT from target_lstm import TARGET_LSTM import cPickle #### # 生成器、 target_lstm、rollout使用同一套模型( rnn(不同的变种) ) # 辨别器: 选用的cnn。 #### ######################################################################################### # 生成器的超参数部分 # Generator Hyper-parameters ###################################################################################### # 词的embedding EMB_DIM = 32 # embedding dimension # rnn的隐含层单元 HIDDEN_DIM = 32 # hidden state dimension of lstm cell # 序列的最大长度 SEQ_LENGTH = 20 # sequence length # rnn开始标示 START_TOKEN = 0 # 预训练 PRE_EPOCH_NUM = 120 # supervise (maximum likelihood estimation) epochs SEED = 88 BATCH_SIZE = 64 ######################################################################################### # Discriminator Hyper-parameters ######################################################################################### # 词的embedding选用的是64 dis_embedding_dim = 64 # 定义CNN中的卷积核大小 dis_filter_sizes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20] # 定义CNN中的卷积核数量 dis_num_filters = [100, 200, 200, 200, 200, 100, 100, 100, 100, 100, 160, 160] dis_dropout_keep_prob = 0.75 dis_l2_reg_lambda = 0.2 dis_batch_size = 64 ######################################################################################### # Basic Training Parameters ######################################################################################### TOTAL_BATCH = 200 positive_file = 'save/real_data.txt' negative_file = 'save/generator_sample.txt' eval_file = 'save/eval_file.txt' generated_num = 10000 # 根据训练好的模型,生成文本; 思想是基于seqence2seqence的思想。 def generate_samples(sess, trainable_model, batch_size, generated_num, output_file): # Generate Samples # 用于保存生成的文本, 之后保存到txt文件中。 generated_samples = [] # 开始批量生成文本数据 for _ in range(int(generated_num / batch_size)): generated_samples.extend(trainable_model.generate(sess)) # 讲生成的文本保存到txt文件中。 用于后边训练rnn模型。 with open(output_file, 'w') as fout: for poem in generated_samples: buffer = ' '.join([str(x) for x in poem]) + '\n' fout.write(buffer) # task loss用于比较真实数据和生成数据样本分布比较, (注意啊,着重看样本分布) def target_loss(sess, target_lstm, data_loader): # target_loss means the oracle negative log-likelihood tested with the oracle model "target_lstm" # For more details, please see the Section 4 in https://arxiv.org/abs/1609.05473 nll = [] # 重置索引,从0开始。 data_loader.reset_pointer() # 遍历每一个batch, 统计真实数据与样本数据分布的比较。 for it in xrange(data_loader.num_batch): batch = data_loader.next_batch() g_loss = sess.run(target_lstm.pretrain_loss, {target_lstm.x: batch}) nll.append(g_loss) return np.mean(nll) # 用于生成模型的预训练。 此处 使用中规中矩的rnn的思路。 def pre_train_epoch(sess, trainable_model, data_loader): # Pre-train the generator using MLE for one epoch supervised_g_losses = [] # 重置索引,从0开始。 data_loader.reset_pointer() # 基于训练样本与测试样本的分布差异,对模型参数进行更新。 for it in xrange(data_loader.num_batch): batch = data_loader.next_batch() _, g_loss = trainable_model.pretrain_step(sess, batch) supervised_g_losses.append(g_loss) return np.mean(supervised_g_losses) def main(): # 随机种子,出一道思考题,这是干嘛子用的啊? random.seed(SEED) np.random.seed(SEED) # 断言 START_TOKEN 是否为0. assert START_TOKEN == 0 # 初始化数据模块,用于训练 gen_data_loader = Gen_Data_loader(BATCH_SIZE) # 初始化数据模块,用于测试 likelihood_data_loader = Gen_Data_loader(BATCH_SIZE) # For testing vocab_size = 5000 # 初始化辨别器数据模块,用于训练 dis_data_loader = Dis_dataloader(BATCH_SIZE) # 定义生成器, 用于模型预训练、 测试。 generator = Generator(vocab_size, BATCH_SIZE, EMB_DIM, HIDDEN_DIM, SEQ_LENGTH, START_TOKEN) # TARGET_LSTM的权重,用于后面初始化模型参数。 target_params = cPickle.load(open('save/target_params.pkl')) # 基于模型参数初始化TARGET_LSTM模型,用于数据生成 和 基于task loss的真实样本与预测样本的模型评估。 target_lstm = TARGET_LSTM(vocab_size, BATCH_SIZE, EMB_DIM, HIDDEN_DIM, SEQ_LENGTH, START_TOKEN, target_params) # The oracle model # 定义辨别器, 用于模型预训练。 discriminator = Discriminator(sequence_length=20, num_classes=2, vocab_size=vocab_size, embedding_size=dis_embedding_dim, filter_sizes=dis_filter_sizes, num_filters=dis_num_filters, l2_reg_lambda=dis_l2_reg_lambda) # GPU——config config = tf.ConfigProto() config.gpu_options.allow_growth = True sess = tf.Session(config=config) sess.run(tf.global_variables_initializer()) # First, use the oracle model to provide the positive examples, which are sampled from the oracle data distribution # 基于训练好的target_lstm生成训练数据。 generate_samples(sess, target_lstm, BATCH_SIZE, generated_num, positive_file) # 将训练数据加载到生成器数据模块 gen_data_loader.create_batches(positive_file) # 定义日志存放文件。 log = open('save/experiment-log.txt', 'w') # pre-train generator print ('Start pre-training...') log.write('pre-training...\n') # 对生成器进行预训练。 for epoch in xrange(PRE_EPOCH_NUM): loss = pre_train_epoch(sess, generator, gen_data_loader) # 经过五个epoch测试一次。 if epoch % 5 == 0: # 生成模型预测数据 generate_samples(sess, generator, BATCH_SIZE, generated_num, eval_file) likelihood_data_loader.create_batches(eval_file) test_loss = target_loss(sess, target_lstm, likelihood_data_loader) print ('pre-train epoch ', epoch, 'test_loss ', test_loss) buffer = 'epoch:\t'+ str(epoch) + '\tnll:\t' + str(test_loss) + '\n' log.write(buffer) print ('Start pre-training discriminator...') # Train 3 epoch on the generated data and do this for 50 times # 对辨别器进行预训练。 for _ in range(50): # 基于预训练的生成器生成 假的文本数据。 generate_samples(sess, generator, BATCH_SIZE, generated_num, negative_file) # 将假的文本数据 加载到 辨别器数据模块, 用于辨别器的预训练。 dis_data_loader.load_train_data(positive_file, negative_file) # 辨别器训练3次,接着训练辨别器。 for _ in range(3): dis_data_loader.reset_pointer() for it in xrange(dis_data_loader.num_batch): x_batch, y_batch = dis_data_loader.next_batch() feed = { discriminator.input_x: x_batch, discriminator.input_y: y_batch, discriminator.dropout_keep_prob: dis_dropout_keep_prob } _ = sess.run(discriminator.train_op, feed) ####################### 预备工作干好了,该办大事了 ######################### # ROLLOUT功能是干嘛的呢, 嘿嘿, 说白了就是补全, 补成完整的句子: 一是因为生成器的输出是离散的,梯度更新从判别器传到生成器比较困难;二是判别器只有当序列被完全生成后才能进行判断,但此刻指导用处已不太大,而如果生成器生成序列的同时判别器来判断,如何平衡当前序列的分数和未来序列的分数又是一个难题。 # 使用生成器参数初始化辨别器的参数。 rollout = ROLLOUT(generator, 0.8) print ('#########################################################################') print ('Start Adversarial Training...') log.write('adversarial training...\n') # 那么我们要开始进行训练了。 规则: 训练生成器一次; 训练辨别器五次。 平衡生成器与判别器。 for total_batch in range(TOTAL_BATCH): # Train the generator for one step # 训练一次生成器。 for it in range(1): samples = generator.generate(sess) # 基于生成器生成的数据 和 辨别器计算rewards。 rewards = rollout.get_reward(sess, samples, 16, discriminator) feed = {generator.x: samples, generator.rewards: rewards} # 基于rewards更新生成器模型参数 _ = sess.run(generator.g_updates, feed_dict=feed) # Test # 迭代五次,测试一次,测试流程跟上面一样哦。 if total_batch % 5 == 0 or total_batch == TOTAL_BATCH - 1: generate_samples(sess, generator, BATCH_SIZE, generated_num, eval_file) likelihood_data_loader.create_batches(eval_file) test_loss = target_loss(sess, target_lstm, likelihood_data_loader) buffer = 'epoch:\t' + str(total_batch) + '\tnll:\t' + str(test_loss) + '\n' print ('total_batch: ', total_batch, 'test_loss: ', test_loss) log.write(buffer) # Update roll-out parameters # 记得用生成器的模型参数进行更新rollout。 rollout.update_params() # Train the discriminator # 训练辨别器五次。 for _ in range(5): # 根据训练的生成器模型,生成句子。 generate_samples(sess, generator, BATCH_SIZE, generated_num, negative_file) # 将假的文本数据 加载到 辨别器数据模块, 用于辨别器的预训练。 dis_data_loader.load_train_data(positive_file, negative_file) # 辨别器训练3次,重新生成假数据,接着训练辨别器。 for _ in range(3): # 重置索引,从0开始。 dis_data_loader.reset_pointer() # 读取每一个batchz-size, 训练辨别器。 for it in xrange(dis_data_loader.num_batch): x_batch, y_batch = dis_data_loader.next_batch() feed = { discriminator.input_x: x_batch, discriminator.input_y: y_batch, discriminator.dropout_keep_prob: dis_dropout_keep_prob } _ = sess.run(discriminator.train_op, feed) # close log.close() if __name__ == '__main__': main()
dataloader.py
import numpy as np class Gen_Data_loader(): def __init__(self, batch_size): self.batch_size = batch_size self.token_stream = [] def create_batches(self, data_file): self.token_stream = [] with open(data_file, 'r') as f: for line in f: line = line.strip() line = line.split() parse_line = [int(x) for x in line] if len(parse_line) == 20: self.token_stream.append(parse_line) self.num_batch = int(len(self.token_stream) / self.batch_size) self.token_stream = self.token_stream[:self.num_batch * self.batch_size] self.sequence_batch = np.split(np.array(self.token_stream), self.num_batch, 0) self.pointer = 0 def next_batch(self): ret = self.sequence_batch[self.pointer] self.pointer = (self.pointer + 1) % self.num_batch return ret def reset_pointer(self): self.pointer = 0 class Dis_dataloader(): def __init__(self, batch_size): self.batch_size = batch_size self.sentences = np.array([]) self.labels = np.array([]) def load_train_data(self, positive_file, negative_file): # Load data positive_examples = [] negative_examples = [] with open(positive_file)as fin: for line in fin: line = line.strip() line = line.split() parse_line = [int(x) for x in line] positive_examples.append(parse_line) with open(negative_file)as fin: for line in fin: line = line.strip() line = line.split() parse_line = [int(x) for x in line] if len(parse_line) == 20: negative_examples.append(parse_line) self.sentences = np.array(positive_examples + negative_examples) # Generate labels positive_labels = [[0, 1] for _ in positive_examples] negative_labels = [[1, 0] for _ in negative_examples] self.labels = np.concatenate([positive_labels, negative_labels], 0) # Shuffle the data shuffle_indices = np.random.permutation(np.arange(len(self.labels))) self.sentences = self.sentences[shuffle_indices] self.labels = self.labels[shuffle_indices] # Split batches self.num_batch = int(len(self.labels) / self.batch_size) self.sentences = self.sentences[:self.num_batch * self.batch_size] self.labels = self.labels[:self.num_batch * self.batch_size] self.sentences_batches = np.split(self.sentences, self.num_batch, 0) self.labels_batches = np.split(self.labels, self.num_batch, 0) self.pointer = 0 def next_batch(self): ret = self.sentences_batches[self.pointer], self.labels_batches[self.pointer] self.pointer = (self.pointer + 1) % self.num_batch return ret def reset_pointer(self): self.pointer = 0
discriminator.py
import tensorflow as tf import numpy as np # An alternative to tf.nn.rnn_cell._linear function, which has been removed in Tensorfow 1.0.1 # The highway layer is borrowed from https://github.com/mkroutikov/tf-lstm-char-cnn def linear(input_, output_size, scope=None): ''' Linear map: output[k] = sum_i(Matrix[k, i] * input_[i] ) + Bias[k] Args: input_: a tensor or a list of 2D, batch x n, Tensors. output_size: int, second dimension of W[i]. scope: VariableScope for the created subgraph; defaults to "Linear". Returns: A 2D Tensor with shape [batch x output_size] equal to sum_i(input_[i] * W[i]), where W[i]s are newly created matrices. Raises: ValueError: if some of the arguments has unspecified or wrong shape. ''' shape = input_.get_shape().as_list() if len(shape) != 2: raise ValueError("Linear is expecting 2D arguments: %s" % str(shape)) if not shape[1]: raise ValueError("Linear expects shape[1] of arguments: %s" % str(shape)) input_size = shape[1] # Now the computation. with tf.variable_scope(scope or "SimpleLinear"): matrix = tf.get_variable("Matrix", [output_size, input_size], dtype=input_.dtype) bias_term = tf.get_variable("Bias", [output_size], dtype=input_.dtype) return tf.matmul(input_, tf.transpose(matrix)) + bias_term def highway(input_, size, num_layers=1, bias=-2.0, f=tf.nn.relu, scope='Highway'): """Highway Network (cf. http://arxiv.org/abs/1505.00387). t = sigmoid(Wy + b) z = t * g(Wy + b) + (1 - t) * y where g is nonlinearity, t is transform gate, and (1 - t) is carry gate. """ with tf.variable_scope(scope): for idx in range(num_layers): g = f(linear(input_, size, scope='highway_lin_%d' % idx)) t = tf.sigmoid(linear(input_, size, scope='highway_gate_%d' % idx) + bias) output = t * g + (1. - t) * input_ input_ = output return output class Discriminator(object): """ A CNN for text classification. Uses an embedding layer, followed by a convolutional, max-pooling and softmax layer. """ def __init__( self, sequence_length, num_classes, vocab_size, embedding_size, filter_sizes, num_filters, l2_reg_lambda=0.0): # Placeholders for input, output and dropout self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x") self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y") self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob") # Keeping track of l2 regularization loss (optional) l2_loss = tf.constant(0.0) with tf.variable_scope('discriminator'): # Embedding layer with tf.device('/cpu:0'), tf.name_scope("embedding"): self.W = tf.Variable( tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0), name="W") self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x) self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1) # Create a convolution + maxpool layer for each filter size pooled_outputs = [] for filter_size, num_filter in zip(filter_sizes, num_filters): with tf.name_scope("conv-maxpool-%s" % filter_size): # Convolution Layer filter_shape = [filter_size, embedding_size, 1, num_filter] W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W") b = tf.Variable(tf.constant(0.1, shape=[num_filter]), name="b") conv = tf.nn.conv2d( self.embedded_chars_expanded, W, strides=[1, 1, 1, 1], padding="VALID", name="conv") # Apply nonlinearity h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu") # Maxpooling over the outputs pooled = tf.nn.max_pool( h, ksize=[1, sequence_length - filter_size + 1, 1, 1], strides=[1, 1, 1, 1], padding='VALID', name="pool") pooled_outputs.append(pooled) # Combine all the pooled features num_filters_total = sum(num_filters) self.h_pool = tf.concat(pooled_outputs, 3) self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total]) # Add highway with tf.name_scope("highway"): self.h_highway = highway(self.h_pool_flat, self.h_pool_flat.get_shape()[1], 1, 0) # Add dropout with tf.name_scope("dropout"): self.h_drop = tf.nn.dropout(self.h_highway, self.dropout_keep_prob) # Final (unnormalized) scores and predictions with tf.name_scope("output"): W = tf.Variable(tf.truncated_normal([num_filters_total, num_classes], stddev=0.1), name="W") b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b") l2_loss += tf.nn.l2_loss(W) l2_loss += tf.nn.l2_loss(b) self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores") self.ypred_for_auc = tf.nn.softmax(self.scores) self.predictions = tf.argmax(self.scores, 1, name="predictions") # CalculateMean cross-entropy loss with tf.name_scope("loss"): losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y) self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss self.params = [param for param in tf.trainable_variables() if 'discriminator' in param.name] d_optimizer = tf.train.AdamOptimizer(1e-4) grads_and_vars = d_optimizer.compute_gradients(self.loss, self.params, aggregation_method=2) self.train_op = d_optimizer.apply_gradients(grads_and_vars)
generator.py
import tensorflow as tf from tensorflow.python.ops import tensor_array_ops, control_flow_ops class Generator(object): def __init__(self, num_emb, batch_size, emb_dim, hidden_dim, sequence_length, start_token, learning_rate=0.01, reward_gamma=0.95): self.num_emb = num_emb self.batch_size = batch_size self.emb_dim = emb_dim self.hidden_dim = hidden_dim self.sequence_length = sequence_length self.start_token = tf.constant([start_token] * self.batch_size, dtype=tf.int32) self.learning_rate = tf.Variable(float(learning_rate), trainable=False) self.reward_gamma = reward_gamma self.g_params = [] self.d_params = [] self.temperature = 1.0 self.grad_clip = 5.0 self.expected_reward = tf.Variable(tf.zeros([self.sequence_length])) with tf.variable_scope('generator'): self.g_embeddings = tf.Variable(self.init_matrix([self.num_emb, self.emb_dim])) self.g_params.append(self.g_embeddings) self.g_recurrent_unit = self.create_recurrent_unit(self.g_params) # maps h_tm1 to h_t for generator self.g_output_unit = self.create_output_unit(self.g_params) # maps h_t to o_t (output token logits) # placeholder definition self.x = tf.placeholder(tf.int32, shape=[self.batch_size, self.sequence_length]) # sequence of tokens generated by generator self.rewards = tf.placeholder(tf.float32, shape=[self.batch_size, self.sequence_length]) # get from rollout policy and discriminator # processed for batch with tf.device("/cpu:0"): self.processed_x = tf.transpose(tf.nn.embedding_lookup(self.g_embeddings, self.x), perm=[1, 0, 2]) # seq_length x batch_size x emb_dim # Initial states self.h0 = tf.zeros([self.batch_size, self.hidden_dim]) self.h0 = tf.stack([self.h0, self.h0]) gen_o = tensor_array_ops.TensorArray(dtype=tf.float32, size=self.sequence_length, dynamic_size=False, infer_shape=True) gen_x = tensor_array_ops.TensorArray(dtype=tf.int32, size=self.sequence_length, dynamic_size=False, infer_shape=True) # sequence2sequence的套路。。。 def _g_recurrence(i, x_t, h_tm1, gen_o, gen_x): h_t = self.g_recurrent_unit(x_t, h_tm1) # hidden_memory_tuple o_t = self.g_output_unit(h_t) # batch x vocab , logits not prob log_prob = tf.log(tf.nn.softmax(o_t)) next_token = tf.cast(tf.reshape(tf.multinomial(log_prob, 1), [self.batch_size]), tf.int32) x_tp1 = tf.nn.embedding_lookup(self.g_embeddings, next_token) # batch x emb_dim gen_o = gen_o.write(i, tf.reduce_sum(tf.multiply(tf.one_hot(next_token, self.num_emb, 1.0, 0.0), tf.nn.softmax(o_t)), 1)) # [batch_size] , prob gen_x = gen_x.write(i, next_token) # indices, batch_size return i + 1, x_tp1, h_t, gen_o, gen_x _, _, _, self.gen_o, self.gen_x = control_flow_ops.while_loop( cond=lambda i, _1, _2, _3, _4: i < self.sequence_length, body=_g_recurrence, loop_vars=(tf.constant(0, dtype=tf.int32), tf.nn.embedding_lookup(self.g_embeddings, self.start_token), self.h0, gen_o, gen_x)) self.gen_x = self.gen_x.stack() # seq_length x batch_size self.gen_x = tf.transpose(self.gen_x, perm=[1, 0]) # batch_size x seq_length # supervised pretraining for generator g_predictions = tensor_array_ops.TensorArray( dtype=tf.float32, size=self.sequence_length, dynamic_size=False, infer_shape=True) ta_emb_x = tensor_array_ops.TensorArray( dtype=tf.float32, size=self.sequence_length) ta_emb_x = ta_emb_x.unstack(self.processed_x) # 中规中矩的RNN。 def _pretrain_recurrence(i, x_t, h_tm1, g_predictions): h_t = self.g_recurrent_unit(x_t, h_tm1) o_t = self.g_output_unit(h_t) g_predictions = g_predictions.write(i, tf.nn.softmax(o_t)) # batch x vocab_size x_tp1 = ta_emb_x.read(i) return i + 1, x_tp1, h_t, g_predictions _, _, _, self.g_predictions = control_flow_ops.while_loop( cond=lambda i, _1, _2, _3: i < self.sequence_length, body=_pretrain_recurrence, loop_vars=(tf.constant(0, dtype=tf.int32), tf.nn.embedding_lookup(self.g_embeddings, self.start_token), self.h0, g_predictions)) self.g_predictions = tf.transpose(self.g_predictions.stack(), perm=[1, 0, 2]) # batch_size x seq_length x vocab_size # pretraining loss self.pretrain_loss = -tf.reduce_sum( tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log( tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0) ) ) / (self.sequence_length * self.batch_size) # training updates pretrain_opt = self.g_optimizer(self.learning_rate) self.pretrain_grad, _ = tf.clip_by_global_norm(tf.gradients(self.pretrain_loss, self.g_params), self.grad_clip) self.pretrain_updates = pretrain_opt.apply_gradients(zip(self.pretrain_grad, self.g_params)) ####################################################################################################### # Unsupervised Training ####################################################################################################### self.g_loss = -tf.reduce_sum( tf.reduce_sum( tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log( tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0) ), 1) * tf.reshape(self.rewards, [-1]) ) g_opt = self.g_optimizer(self.learning_rate) self.g_grad, _ = tf.clip_by_global_norm(tf.gradients(self.g_loss, self.g_params), self.grad_clip) self.g_updates = g_opt.apply_gradients(zip(self.g_grad, self.g_params)) def generate(self, sess): outputs = sess.run(self.gen_x) return outputs def pretrain_step(self, sess, x): outputs = sess.run([self.pretrain_updates, self.pretrain_loss], feed_dict={self.x: x}) return outputs def init_matrix(self, shape): return tf.random_normal(shape, stddev=0.1) def init_vector(self, shape): return tf.zeros(shape) def create_recurrent_unit(self, params): # Weights and Bias for input and hidden tensor self.Wi = tf.Variable(self.init_matrix([self.emb_dim, self.hidden_dim])) self.Ui = tf.Variable(self.init_matrix([self.hidden_dim, self.hidden_dim])) self.bi = tf.Variable(self.init_matrix([self.hidden_dim])) self.Wf = tf.Variable(self.init_matrix([self.emb_dim, self.hidden_dim])) self.Uf = tf.Variable(self.init_matrix([self.hidden_dim, self.hidden_dim])) self.bf = tf.Variable(self.init_matrix([self.hidden_dim])) self.Wog = tf.Variable(self.init_matrix([self.emb_dim, self.hidden_dim])) self.Uog = tf.Variable(self.init_matrix([self.hidden_dim, self.hidden_dim])) self.bog = tf.Variable(self.init_matrix([self.hidden_dim])) self.Wc = tf.Variable(self.init_matrix([self.emb_dim, self.hidden_dim])) self.Uc = tf.Variable(self.init_matrix([self.hidden_dim, self.hidden_dim])) self.bc = tf.Variable(self.init_matrix([self.hidden_dim])) params.extend([ self.Wi, self.Ui, self.bi, self.Wf, self.Uf, self.bf, self.Wog, self.Uog, self.bog, self.Wc, self.Uc, self.bc]) def unit(x, hidden_memory_tm1): previous_hidden_state, c_prev = tf.unstack(hidden_memory_tm1) # Input Gate i = tf.sigmoid( tf.matmul(x, self.Wi) + tf.matmul(previous_hidden_state, self.Ui) + self.bi ) # Forget Gate f = tf.sigmoid( tf.matmul(x, self.Wf) + tf.matmul(previous_hidden_state, self.Uf) + self.bf ) # Output Gate o = tf.sigmoid( tf.matmul(x, self.Wog) + tf.matmul(previous_hidden_state, self.Uog) + self.bog ) # New Memory Cell c_ = tf.nn.tanh( tf.matmul(x, self.Wc) + tf.matmul(previous_hidden_state, self.Uc) + self.bc ) # Final Memory cell c = f * c_prev + i * c_ # Current Hidden state current_hidden_state = o * tf.nn.tanh(c) return tf.stack([current_hidden_state, c]) return unit def create_output_unit(self, params): self.Wo = tf.Variable(self.init_matrix([self.hidden_dim, self.num_emb])) self.bo = tf.Variable(self.init_matrix([self.num_emb])) params.extend([self.Wo, self.bo]) def unit(hidden_memory_tuple): hidden_state, c_prev = tf.unstack(hidden_memory_tuple) # hidden_state : batch x hidden_dim logits = tf.matmul(hidden_state, self.Wo) + self.bo # output = tf.nn.softmax(logits) return logits return unit def g_optimizer(self, *args, **kwargs): return tf.train.AdamOptimizer(*args, **kwargs)
rollout.py
import tensorflow as tf from tensorflow.python.ops import tensor_array_ops, control_flow_ops import numpy as np class ROLLOUT(object): def __init__(self, lstm, update_rate): self.lstm = lstm self.update_rate = update_rate self.num_emb = self.lstm.num_emb self.batch_size = self.lstm.batch_size self.emb_dim = self.lstm.emb_dim self.hidden_dim = self.lstm.hidden_dim self.sequence_length = self.lstm.sequence_length self.start_token = tf.identity(self.lstm.start_token) self.learning_rate = self.lstm.learning_rate self.g_embeddings = tf.identity(self.lstm.g_embeddings) self.g_recurrent_unit = self.create_recurrent_unit() # maps h_tm1 to h_t for generator self.g_output_unit = self.create_output_unit() # maps h_t to o_t (output token logits) ##################################################################################################### # placeholder definition self.x = tf.placeholder(tf.int32, shape=[self.batch_size, self.sequence_length]) # sequence of tokens generated by generator self.given_num = tf.placeholder(tf.int32) # processed for batch with tf.device("/cpu:0"): self.processed_x = tf.transpose(tf.nn.embedding_lookup(self.g_embeddings, self.x), perm=[1, 0, 2]) # seq_length x batch_size x emb_dim ta_emb_x = tensor_array_ops.TensorArray( dtype=tf.float32, size=self.sequence_length) ta_emb_x = ta_emb_x.unstack(self.processed_x) ta_x = tensor_array_ops.TensorArray(dtype=tf.int32, size=self.sequence_length) ta_x = ta_x.unstack(tf.transpose(self.x, perm=[1, 0])) ##################################################################################################### self.h0 = tf.zeros([self.batch_size, self.hidden_dim]) self.h0 = tf.stack([self.h0, self.h0]) gen_x = tensor_array_ops.TensorArray(dtype=tf.int32, size=self.sequence_length, dynamic_size=False, infer_shape=True) # When current index i < given_num, use the provided tokens as the input at each time step def _g_recurrence_1(i, x_t, h_tm1, given_num, gen_x): h_t = self.g_recurrent_unit(x_t, h_tm1) # hidden_memory_tuple x_tp1 = ta_emb_x.read(i) gen_x = gen_x.write(i, ta_x.read(i)) return i + 1, x_tp1, h_t, given_num, gen_x # When current index i >= given_num, start roll-out, use the output as time step t as the input at time step t+1 def _g_recurrence_2(i, x_t, h_tm1, given_num, gen_x): h_t = self.g_recurrent_unit(x_t, h_tm1) # hidden_memory_tuple o_t = self.g_output_unit(h_t) # batch x vocab , logits not prob log_prob = tf.log(tf.nn.softmax(o_t)) next_token = tf.cast(tf.reshape(tf.multinomial(log_prob, 1), [self.batch_size]), tf.int32) x_tp1 = tf.nn.embedding_lookup(self.g_embeddings, next_token) # batch x emb_dim gen_x = gen_x.write(i, next_token) # indices, batch_size return i + 1, x_tp1, h_t, given_num, gen_x i, x_t, h_tm1, given_num, self.gen_x = control_flow_ops.while_loop( cond=lambda i, _1, _2, given_num, _4: i < given_num, body=_g_recurrence_1, loop_vars=(tf.constant(0, dtype=tf.int32), tf.nn.embedding_lookup(self.g_embeddings, self.start_token), self.h0, self.given_num, gen_x)) _, _, _, _, self.gen_x = control_flow_ops.while_loop( cond=lambda i, _1, _2, _3, _4: i < self.sequence_length, body=_g_recurrence_2, loop_vars=(i, x_t, h_tm1, given_num, self.gen_x)) self.gen_x = self.gen_x.stack() # seq_length x batch_size self.gen_x = tf.transpose(self.gen_x, perm=[1, 0]) # batch_size x seq_length def get_reward(self, sess, input_x, rollout_num, discriminator): rewards = [] for i in range(rollout_num): # given_num between 1 to sequence_length - 1 for a part completed sentence for given_num in range(1, self.sequence_length ): feed = {self.x: input_x, self.given_num: given_num} samples = sess.run(self.gen_x, feed) feed = {discriminator.input_x: samples, discriminator.dropout_keep_prob: 1.0} ypred_for_auc = sess.run(discriminator.ypred_for_auc, feed) ypred = np.array([item[1] for item in ypred_for_auc]) if i == 0: rewards.append(ypred) else: rewards[given_num - 1] += ypred # the last token reward feed = {discriminator.input_x: input_x, discriminator.dropout_keep_prob: 1.0} ypred_for_auc = sess.run(discriminator.ypred_for_auc, feed) ypred = np.array([item[1] for item in ypred_for_auc]) if i == 0: rewards.append(ypred) else: # completed sentence reward rewards[self.sequence_length - 1] += ypred rewards = np.transpose(np.array(rewards)) / (1.0 * rollout_num) # batch_size x seq_length return rewards def create_recurrent_unit(self): # Weights and Bias for input and hidden tensor self.Wi = tf.identity(self.lstm.Wi) self.Ui = tf.identity(self.lstm.Ui) self.bi = tf.identity(self.lstm.bi) self.Wf = tf.identity(self.lstm.Wf) self.Uf = tf.identity(self.lstm.Uf) self.bf = tf.identity(self.lstm.bf) self.Wog = tf.identity(self.lstm.Wog) self.Uog = tf.identity(self.lstm.Uog) self.bog = tf.identity(self.lstm.bog) self.Wc = tf.identity(self.lstm.Wc) self.Uc = tf.identity(self.lstm.Uc) self.bc = tf.identity(self.lstm.bc) def unit(x, hidden_memory_tm1): previous_hidden_state, c_prev = tf.unstack(hidden_memory_tm1) # Input Gate i = tf.sigmoid( tf.matmul(x, self.Wi) + tf.matmul(previous_hidden_state, self.Ui) + self.bi ) # Forget Gate f = tf.sigmoid( tf.matmul(x, self.Wf) + tf.matmul(previous_hidden_state, self.Uf) + self.bf ) # Output Gate o = tf.sigmoid( tf.matmul(x, self.Wog) + tf.matmul(previous_hidden_state, self.Uog) + self.bog ) # New Memory Cell c_ = tf.nn.tanh( tf.matmul(x, self.Wc) + tf.matmul(previous_hidden_state, self.Uc) + self.bc ) # Final Memory cell c = f * c_prev + i * c_ # Current Hidden state current_hidden_state = o * tf.nn.tanh(c) return tf.stack([current_hidden_state, c]) return unit def update_recurrent_unit(self): # Weights and Bias for input and hidden tensor self.Wi = self.update_rate * self.Wi + (1 - self.update_rate) * tf.identity(self.lstm.Wi) self.Ui = self.update_rate * self.Ui + (1 - self.update_rate) * tf.identity(self.lstm.Ui) self.bi = self.update_rate * self.bi + (1 - self.update_rate) * tf.identity(self.lstm.bi) self.Wf = self.update_rate * self.Wf + (1 - self.update_rate) * tf.identity(self.lstm.Wf) self.Uf = self.update_rate * self.Uf + (1 - self.update_rate) * tf.identity(self.lstm.Uf) self.bf = self.update_rate * self.bf + (1 - self.update_rate) * tf.identity(self.lstm.bf) self.Wog = self.update_rate * self.Wog + (1 - self.update_rate) * tf.identity(self.lstm.Wog) self.Uog = self.update_rate * self.Uog + (1 - self.update_rate) * tf.identity(self.lstm.Uog) self.bog = self.update_rate * self.bog + (1 - self.update_rate) * tf.identity(self.lstm.bog) self.Wc = self.update_rate * self.Wc + (1 - self.update_rate) * tf.identity(self.lstm.Wc) self.Uc = self.update_rate * self.Uc + (1 - self.update_rate) * tf.identity(self.lstm.Uc) self.bc = self.update_rate * self.bc + (1 - self.update_rate) * tf.identity(self.lstm.bc) def unit(x, hidden_memory_tm1): previous_hidden_state, c_prev = tf.unstack(hidden_memory_tm1) # Input Gate i = tf.sigmoid( tf.matmul(x, self.Wi) + tf.matmul(previous_hidden_state, self.Ui) + self.bi ) # Forget Gate f = tf.sigmoid( tf.matmul(x, self.Wf) + tf.matmul(previous_hidden_state, self.Uf) + self.bf ) # Output Gate o = tf.sigmoid( tf.matmul(x, self.Wog) + tf.matmul(previous_hidden_state, self.Uog) + self.bog ) # New Memory Cell c_ = tf.nn.tanh( tf.matmul(x, self.Wc) + tf.matmul(previous_hidden_state, self.Uc) + self.bc ) # Final Memory cell c = f * c_prev + i * c_ # Current Hidden state current_hidden_state = o * tf.nn.tanh(c) return tf.stack([current_hidden_state, c]) return unit def create_output_unit(self): self.Wo = tf.identity(self.lstm.Wo) self.bo = tf.identity(self.lstm.bo) def unit(hidden_memory_tuple): hidden_state, c_prev = tf.unstack(hidden_memory_tuple) # hidden_state : batch x hidden_dim logits = tf.matmul(hidden_state, self.Wo) + self.bo # output = tf.nn.softmax(logits) return logits return unit def update_output_unit(self): self.Wo = self.update_rate * self.Wo + (1 - self.update_rate) * tf.identity(self.lstm.Wo) self.bo = self.update_rate * self.bo + (1 - self.update_rate) * tf.identity(self.lstm.bo) def unit(hidden_memory_tuple): hidden_state, c_prev = tf.unstack(hidden_memory_tuple) # hidden_state : batch x hidden_dim logits = tf.matmul(hidden_state, self.Wo) + self.bo # output = tf.nn.softmax(logits) return logits return unit def update_params(self): self.g_embeddings = tf.identity(self.lstm.g_embeddings) self.g_recurrent_unit = self.update_recurrent_unit() self.g_output_unit = self.update_output_unit()
target_lstm.py
import tensorflow as tf from tensorflow.python.ops import tensor_array_ops, control_flow_ops class TARGET_LSTM(object): def __init__(self, num_emb, batch_size, emb_dim, hidden_dim, sequence_length, start_token, params): self.num_emb = num_emb self.batch_size = batch_size self.emb_dim = emb_dim self.hidden_dim = hidden_dim self.sequence_length = sequence_length self.start_token = tf.constant([start_token] * self.batch_size, dtype=tf.int32) self.g_params = [] self.temperature = 1.0 self.params = params tf.set_random_seed(66) with tf.variable_scope('generator'): self.g_embeddings = tf.Variable(self.params[0]) self.g_params.append(self.g_embeddings) self.g_recurrent_unit = self.create_recurrent_unit(self.g_params) # maps h_tm1 to h_t for generator self.g_output_unit = self.create_output_unit(self.g_params) # maps h_t to o_t (output token logits) # placeholder definition self.x = tf.placeholder(tf.int32, shape=[self.batch_size, self.sequence_length]) # sequence of tokens generated by generator # processed for batch with tf.device("/cpu:0"): self.processed_x = tf.transpose(tf.nn.embedding_lookup(self.g_embeddings, self.x), perm=[1, 0, 2]) # seq_length x batch_size x emb_dim # initial states self.h0 = tf.zeros([self.batch_size, self.hidden_dim]) self.h0 = tf.stack([self.h0, self.h0]) # generator on initial randomness gen_o = tensor_array_ops.TensorArray(dtype=tf.float32, size=self.sequence_length, dynamic_size=False, infer_shape=True) gen_x = tensor_array_ops.TensorArray(dtype=tf.int32, size=self.sequence_length, dynamic_size=False, infer_shape=True) def _g_recurrence(i, x_t, h_tm1, gen_o, gen_x): h_t = self.g_recurrent_unit(x_t, h_tm1) # hidden_memory_tuple o_t = self.g_output_unit(h_t) # batch x vocab , logits not prob log_prob = tf.log(tf.nn.softmax(o_t)) next_token = tf.cast(tf.reshape(tf.multinomial(log_prob, 1), [self.batch_size]), tf.int32) x_tp1 = tf.nn.embedding_lookup(self.g_embeddings, next_token) # batch x emb_dim gen_o = gen_o.write(i, tf.reduce_sum(tf.multiply(tf.one_hot(next_token, self.num_emb, 1.0, 0.0), tf.nn.softmax(o_t)), 1)) # [batch_size] , prob gen_x = gen_x.write(i, next_token) # indices, batch_size return i + 1, x_tp1, h_t, gen_o, gen_x _, _, _, self.gen_o, self.gen_x = control_flow_ops.while_loop( cond=lambda i, _1, _2, _3, _4: i < self.sequence_length, body=_g_recurrence, loop_vars=(tf.constant(0, dtype=tf.int32), tf.nn.embedding_lookup(self.g_embeddings, self.start_token), self.h0, gen_o, gen_x) ) self.gen_x = self.gen_x.stack() # seq_length x batch_size self.gen_x = tf.transpose(self.gen_x, perm=[1, 0]) # batch_size x seq_length # supervised pretraining for generator g_predictions = tensor_array_ops.TensorArray( dtype=tf.float32, size=self.sequence_length, dynamic_size=False, infer_shape=True) ta_emb_x = tensor_array_ops.TensorArray( dtype=tf.float32, size=self.sequence_length) ta_emb_x = ta_emb_x.unstack(self.processed_x) def _pretrain_recurrence(i, x_t, h_tm1, g_predictions): h_t = self.g_recurrent_unit(x_t, h_tm1) o_t = self.g_output_unit(h_t) g_predictions = g_predictions.write(i, tf.nn.softmax(o_t)) # batch x vocab_size x_tp1 = ta_emb_x.read(i) return i + 1, x_tp1, h_t, g_predictions _, _, _, self.g_predictions = control_flow_ops.while_loop( cond=lambda i, _1, _2, _3: i < self.sequence_length, body=_pretrain_recurrence, loop_vars=(tf.constant(0, dtype=tf.int32), tf.nn.embedding_lookup(self.g_embeddings, self.start_token), self.h0, g_predictions)) self.g_predictions = tf.transpose( self.g_predictions.stack(), perm=[1, 0, 2]) # batch_size x seq_length x vocab_size # pretraining loss self.pretrain_loss = -tf.reduce_sum( tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log( tf.reshape(self.g_predictions, [-1, self.num_emb]))) / (self.sequence_length * self.batch_size) self.out_loss = tf.reduce_sum( tf.reshape( -tf.reduce_sum( tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log( tf.reshape(self.g_predictions, [-1, self.num_emb])), 1 ), [-1, self.sequence_length] ), 1 ) # batch_size def generate(self, session): # h0 = np.random.normal(size=self.hidden_dim) outputs = session.run(self.gen_x) return outputs def init_matrix(self, shape): return tf.random_normal(shape, stddev=1.0) def create_recurrent_unit(self, params): # Weights and Bias for input and hidden tensor self.Wi = tf.Variable(self.params[1]) self.Ui = tf.Variable(self.params[2]) self.bi = tf.Variable(self.params[3]) self.Wf = tf.Variable(self.params[4]) self.Uf = tf.Variable(self.params[5]) self.bf = tf.Variable(self.params[6]) self.Wog = tf.Variable(self.params[7]) self.Uog = tf.Variable(self.params[8]) self.bog = tf.Variable(self.params[9]) self.Wc = tf.Variable(self.params[10]) self.Uc = tf.Variable(self.params[11]) self.bc = tf.Variable(self.params[12]) params.extend([ self.Wi, self.Ui, self.bi, self.Wf, self.Uf, self.bf, self.Wog, self.Uog, self.bog, self.Wc, self.Uc, self.bc]) def unit(x, hidden_memory_tm1): previous_hidden_state, c_prev = tf.unstack(hidden_memory_tm1) # Input Gate i = tf.sigmoid( tf.matmul(x, self.Wi) + tf.matmul(previous_hidden_state, self.Ui) + self.bi ) # Forget Gate f = tf.sigmoid( tf.matmul(x, self.Wf) + tf.matmul(previous_hidden_state, self.Uf) + self.bf ) # Output Gate o = tf.sigmoid( tf.matmul(x, self.Wog) + tf.matmul(previous_hidden_state, self.Uog) + self.bog ) # New Memory Cell c_ = tf.nn.tanh( tf.matmul(x, self.Wc) + tf.matmul(previous_hidden_state, self.Uc) + self.bc ) # Final Memory cell c = f * c_prev + i * c_ # Current Hidden state current_hidden_state = o * tf.nn.tanh(c) return tf.stack([current_hidden_state, c]) return unit def create_output_unit(self, params): self.Wo = tf.Variable(self.params[13]) self.bo = tf.Variable(self.params[14]) params.extend([self.Wo, self.bo]) def unit(hidden_memory_tuple): hidden_state, c_prev = tf.unstack(hidden_memory_tuple) # hidden_state : batch x hidden_dim logits = tf.matmul(hidden_state, self.Wo) + self.bo # output = tf.nn.softmax(logits) return logits return unit
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。