Recurrent Neural Networks

栏目: IT技术 · 发布时间: 4年前

内容简介：The goal of this article is to explore Recurrent Neural Networks in-depth, which are a kind of Neural Networks with a different architecture than the ones seen in previous articles (Link).Concretely, the article is segmented in the following parts:As we ha

Understand the intuition behind RNN!

Victor Roman

May 3 ·9min read

Introduction

The goal of this article is to explore Recurrent Neural Networks in-depth, which are a kind of Neural Networks with a different architecture than the ones seen in previous articles (Link).

Concretely, the article is segmented in the following parts:

What RNNs are
Long Short-Term Memory (LSTM) networks
Implementation of RNNs to temporal series

What are RNNs?

As we have seen here, CNNs do not have any kind of memory, RNNs can fo beyond this limitation of ‘starting to think from scratch’ each time because they have some kind of memory.

Let’s see how do they work with a very visual example:

Example

Let’s say that we live in an apartment and we have the perfect roommate, he cooks one different meal depending on the weather, sunny or rainy.

So, if we codify these meals with vectors:

And our Neural Network does the following:

If we recall, neural networks learn some weights that can be expressed as matrixes, and those weights are used to make predictions. Ours will be as follows:

If it is a sunny day:

If it is a rainy day:

And if we take a look at our weight matrix, this time seen as a graph:

Let’s see now what add RNNs following this example:

Recurrent Neural Networks

Let’s say that now our dear roommate not only bases the decision of what to cook on the weather but now simply looks at what he cooked yesterday.

The network in charge of getting to predict what the roommate will cook tomorrow based on what she cooked today is a Recurrent Neural Network (RNN).

This RNN can be expressed as the following matrix:

So what we have is a:

Let’s Make it a Little Bit More Complex

Imagine now that your roommate decides what to cook based on what she cooked yesterday and the weather.

If the day is sunny, she spends the day on the terrace with a good beer in her hand, so she does not cook, so we eat the same thing as yesterday. But
If it rains, she stays home and cooks.

It would be something like this:

So we end up having one model that tells us what we are going to eat depending on what we ate yesterday and another model that tells us whether our roommate will cook or not.

And the add and merge operations are the following:

And here you can see the graph:

And that is how it works!

This example is from a great video which I recommend you check out as many times as you need to interiorize and understand the previous explanation. You can find the video here: https://www.youtube.com/watch?v=UNmqTiOnRfg

And what are RNNs used for?

There are several types:

They are very good at making predictions, especially when our data is sequential:

Stock market forecasts

The values of a share depend largely on the values it had previously

Sequence generation

As long as data are sequences and data in an instant t depends on the data in the instant t-1.

Text generation

For example, when your cell phone suggests words. It looks at the last word you have written, and at the letters, you are writing at that moment to suggest the next letters or even words.

Voice recognition

In this case, we have the previous word recognized, and the audio that reaches us at that moment.

Long Short-Term Memory Networks

Let’s study now how the most popular RNN work. They are the LSTM networks and their structure is as follows:

But first: Why are they the most popular ones?

It turns out that conventional RNNs have memory problems. Specially designed memory networks are incapable of long-term memory. And why is this a problem?

Well, going back to the problem of our roommate, for this example we just need to know what we ate yesterday, so nothing would happen.

But imagine if instead of a three-course menu, I had 60 courses.

Conventional RNNs wouldn’t be able to remember things that happened a long time ago. However, the LSTM would!

And why?

Let’s take a look at the architecture of the RNN and the LSTM:

RNN

LSTM

It turns out that where RNNs have a single layer, LSTMs have a combination of layers that interact with each other in a very special way.

Let’s try to understand this, but first, let me explain the nomenclature:

Figure by Author

In the diagrams above:

A vector travels along each line, from the output of one node to the inputs of others.
The pink circles indicate element to element operations, such as vector sums, while the yellow boxes are neural layers that are learned by training.
Lines that join indicate concatenation, and lines that separate indicate that the same line content travels to two different destinations.

The key idea of LSTMs

The key is the state of the cell, which is indicated in the diagram as the line that travels across the top:

The state of the cell is like a kind of conveyor belt that travels along with the whole architecture of the network with very few interactions (and they are linear): this implies that the information simply flows without being modified.

The ingenious part is that the layers of the LSTM can (or cannot) contribute information to this conveyor belt, and that decision is made by the “gates”:

The gates are nothing more than a way of carefully regulating the information that arrives on the conveyor belt. They are composed of a neural network with sigmoid-type activation and elemental multiplication.

Thus, the sigmoid layer outputs a number between 0 and one, which implies how important that information is to let it pass to the conveyor belt. 0 means I don’t care, and a 1 means it’s very important.

As you can see in the diagram, an LSTM has 3 such doors, to protect and control the conveyor belt.

The specific details about this operation, are greatly explained here: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

And this blog is also very interesting: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

With this in mind, let’s see what Recurring Networks can do!

LSTM Implementation

Image Classification with LSTM

We’ll follow an example that can be found here:

https://medium.com/the-artificial-impostor/notes-understanding-tensorflow-part-2-f7e5ece849f5

from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.datasets import mnist
from keras.utils import np_utils
from keras import initializers
  
# Hyper parameters
batch_size = 128
nb_epoch = 10# Parameters for MNIST dataset
img_rows, img_cols = 28, 28
nb_classes = 10# Parameters for LSTM network
nb_lstm_outputs = 30
nb_time_steps = img_rows
dim_input_vector = img_cols# Load MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print('X_train original shape:', X_train.shape)
input_shape = (nb_time_steps, dim_input_vector)X_train = X_train.astype('float32') / 255.
X_test = X_test.astype('float32') / 255.Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# LSTM Building
model = Sequential()
model.add(LSTM(nb_lstm_outputs, input_shape=input_shape))
model.add(Dense(nb_classes, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

# Training the model
history = model.fit(X_train, 
                    Y_train, 
                    nb_epoch=nb_epoch, 
                    batch_size=batch_size, 
                    shuffle=True,
                    validation_data=(X_test, Y_test),
                    verbose=1)

# Evaluation
evaluation = model.evaluate(X_test, Y_test, batch_size=batch_size, verbose=1)
print('Summary: Loss over the test dataset: %.2f, Accuracy: %.2f' % (evaluation[0], evaluation[1]))

Time Series Prediction with LSTM

# LSTM for international airline passengers problem with regression framing
# https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
!wget https://raw.githubusercontent.com/lazyprogrammer/machine_learning_examples/master/airline/international-airline-passengers.csvimport numpy
import matplotlib.pyplot as plt
from pandas import read_csv
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
 dataX, dataY = [], []
 for i in range(len(dataset)-look_back-1):
  a = dataset[i:(i+look_back), 0]
  dataX.append(a)
  dataY.append(dataset[i + look_back, 0])
 return numpy.array(dataX), numpy.array(dataY)# fix random seed for reproducibility
numpy.random.seed(7)# load the dataset
dataframe = read_csv('international-airline-passengers.csv', usecols=[1], engine='python', skipfooter=3)
dataset = dataframe.values
dataset = dataset.astype('float32')# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)# split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]# reshape into X=t and Y=t+1
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))# create and fit the LSTM network
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)# invert predictions
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform([trainY])
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform([testY])# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict# plot baseline and predictions
plt.plot(scaler.inverse_transform(dataset))
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()

Final Words

As always, I hope you enjoyed the post, and that you gained an intuition about RNNs and how to implement them!

If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here .

If you want to learn more about Machine Learning, Data Science and Artificial Intelligence follow me on Medium , and stay tuned for my next posts!

以上所述就是小编给大家介绍的《Recurrent Neural Networks》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Linux 系统编程（第二版）

Robert Love / 东南大学出版社 / 2014-1-1 / 78

如何编写那些直接依赖于Linux内核和核心系统库提供的服务的软件？通过《Linux系统编程（第2版）（影印版）》，Linux内核参与者RobertLove（洛夫）为你提供了Linux系统编程方面的教程，Linux系统调用的参考手册，以及对于如何编写更聪明和更快的代码的来自内部人士的建议。Love清晰地指出了POSIX标准函数和Linux特别提供服务之间的差异。通过关于多线程的新章节，这本修订和扩展......一起来看看《Linux 系统编程（第二版）》这本书的介绍吧!

码农工具

SHA 加密

SHA 加密工具

Recurrent Neural Networks

Understand the intuition behind RNN!

Introduction

What are RNNs?

Example

Recurrent Neural Networks

Let’s Make it a Little Bit More Complex

And what are RNNs used for?

Stock market forecasts

Text generation

Voice recognition

Long Short-Term Memory Networks

RNN

LSTM

The key idea of LSTMs

LSTM Implementation

Image Classification with LSTM

Time Series Prediction with LSTM

Final Words

Linux 系统编程（第二版）

SHA 加密

UNIX 时间戳转换