内容简介:The goal of this article is to define and solve pratical use cases with TensorFlow. To do so, we will solve:Netflix has decided to place one of their famous posters in a building. The marketing team has decided that the advertising poster has to cover an a
Learn how to Solve Optimization Problems and Train your First Neural Network with the MNIST Dataset!
Jan 24 ·10min read
Introduction
The goal of this article is to define and solve pratical use cases with TensorFlow. To do so, we will solve:
- An optimization problem
- A linear regression problem, where we will adjust a regression line to a dataset
- And we will end solving the “Hello World” of Deep Learning classification projects with the MINST Dataset.
Optimization Problem
Netflix has decided to place one of their famous posters in a building. The marketing team has decided that the advertising poster has to cover an area of 600 square meters, with a margin of 2 meters above and below and 4 meters left and right.
However, they have not been informed of the dimensions of the building’s facade. We could send an email to the owner and ask him, but as we know mathematics we can solve it easily. How can we find out the dimensions of the building?
The total area of the building is:
Width = 4 + x + 4 = x +8
Height = 2 + y + 2 = y +4
Area = Width x Height = (x + 8)*(y + 4)
And there is the constraint of: x*y = 600
This allows us to write an equation system:
xy = 600 → x = 600/y
S(y)= (600/y + 8)(y + 4) = 600 +8y +4*600/y +32 = 632 + 8y + 2400/y
In an optimization problem, the information of the slope of the function, (the derivative) is used to calculate its minimum. We have to equal the first derivative to 0 and then check that the second derivative is positive. So, in this case:
S’(y) = 8–2400/y²
S’’(y) = 4800/y³
S’(y) = 0 → 0 = 8–2400/y² → 8 = 2400/y² → y² = 2400/8 = 300 → y = sqrt(300) = sqrt(100–3) = sqrt(100)-sqrt(3) = 10-sqrt(3) = 17.32 (we discard the negative sign because it has no physical meaning)
Substituting in x:
x =600 / 10-sqrt(3) = 60 / sqrt(3) = 60-sqrt(3) / sqrt(3)-sqrt(3) = 60-sqrt(3) / 3 = 20-sqrt(3) = 34.64
As for y = 17.32 -> S’’(y) = 0.9238 > 0, we have found the minimum solution.
Therefore, the dimensions of the building are:
Width: x + 8 = 42.64 m
Height: y + 4 = 21.32 m
Have you seen how useful derivatives are? We just solved this problem analytically. We have been able to solve it because it was a simple problem, but there are many problems for which it is very computationally expensive to solve them analytically, so we use numerical methods. One of these methods is Gradient Descent.
What do you say if we solve this problem this time numerically with Tensorflow? Let’s go!
import numpy as np import tensorflow as tfx = tf.Variable(initial_value=tf.random_uniform([1], 34, 35),name=’x’) y = tf.Variable(initial_value=tf.random_uniform([1], 0., 50.), name=’y’)# Loss function s = tf.add(tf.add(632.0, tf.multiply(8.0, y)), tf.divide(2400.0, y), ‘s’)opt = tf.train.GradientDescentOptimizer(0.05) train = opt.minimize(s)sess = tf.Session()init = tf.initialize_all_variables() sess.run(init)old_solution = 0 tolerance = 1e-4 for step in range(500): sess.run(train) solution = sess.run(y) if np.abs(solution — old_solution) < tolerance: print(“The solution is y = {}”.format(old_solution)) break old_solution = solution if step % 10 == 0: print(step, “y = “ + str(old_solution), “s = “ + str(sess.run(s)))
We have managed to calculate y using the gradient descent algorithm. Of course, we now need to calculate x substituting x = 600/y.
x = 600/old_solution[0] print(x)
Which matches our results, so it seems to work! Let’s plot the results:
import matplotlib.pyplot as plty = np.linspace(0, 400., 500) s = 632.0 + 8*y + 2400/y plt.plot(y, s)
print("The function minimum is in {}".format(np.min(s))) min_s = np.min(s) s_min_idx = np.nonzero(s==min_s) y_min = y[s_min_idx] print("The y value that reaches the minimum is {}".format(y_min[0]))
Let’s See other Example
In this case, we want to find the minimum of the y = log2(x) function.
x = tf.Variable(15, name='x', dtype=tf.float32) log_x = tf.log(x) log_x_squared = tf.square(log_x)optimizer = tf.train.GradientDescentOptimizer(0.5) train = optimizer.minimize(log_x_squared)init = tf.initialize_all_variables()def optimize(): with tf.Session() as session: session.run(init) print("starting at", "x:", session.run(x), "log(x)^2:", session.run(log_x_squared)) for step in range(100): session.run(train) print("step", step, "x:", session.run(x), "log(x)^2:", session.run(log_x_squared)) optimize()
Let’s plot it!
x_values = np.linspace(0,10,100) fx = np.log(x_values)**2 plt.plot(x_values, fx)print("The function minimum is in {}".format(np.min(fx))) min_fx = np.min(fx) fx_min_idx = np.nonzero(fx==min_fx) x_min_value = x_values[fx_min_idx] print("The y value that reaches the minimum is {}".format(x_min_value[0]))
Let’s Solve a Linear Regression Problem
Let’s see how to adjust a straight line to a dataset that represent the intelligence of every character in the Simpson’s show, from Ralph Wigum to Doctor Frink.
Let’s plot the distribution of intelligence against the age, normalized from 0 to 1, where Maggie is the youngest and Montgomery Burns the oldest:
n_observations = 50 _, ax = plt.subplots(1, 1) xs = np.linspace(0., 1., n_observations) ys = 100 * np.sin(xs) + np.random.uniform(0., 50., n_observations) ax.scatter(xs, ys) plt.draw()
Now, we need two tf.placeholders, one to the entry and other to the exit of our regression algorithm. Placeholders are variables that do not need to be assigned a value until the network is executed.
X = tf.placeholder(tf.float32) Y = tf.placeholder(tf.float32)
Let’s try to optimizie a straight line of linear regression. We need two variables, the weights (W) and the bias (b). Elements of the type tf.Variable need an initialization and its type cannot be changed after being declared. What we can change is its value, by the “assign” method.
W = tf.Variable(tf.random_normal([1]), name='weight') b = tf.Variable(tf.random_normal([1]), name='bias') Y_pred = tf.add(tf.multiply(X, W), b)
Let’s define now the cost function as the difference between our predictions and the real values.
loss = tf.reduce_mean(tf.pow(Y_pred - y, 2))
We’ll define now the optimization method, we will use the gradient descent. Basically, it calculates the variation of each weight with respect to the total error, and updates each weight so that the total error decreases in subsequent iterations. The learning rate indicates how abruptly the weights are updated.
learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)# Definition of the number of iterations and start the initialization using the GPU
n_epochs = 1000with tf.Session() as sess:
with tf.device("/GPU:0"): # We initialize now all the defined variables
sess.run(tf.global_variables_initializer()) # Start the adjust
prev_training_loss = 0.0
for epoch_i in range(n_epochs):
for (x, y) in zip(xs, ys):
sess.run(optimizer, feed_dict={X: x, Y: y}) W_, b_, training_loss = sess.run([W, b, loss], feed_dict={X: xs, Y: ys}) # We print the losses every 20 epochs
if epoch_i % 20 == 0:
print(training_loss) # Ending conditions
if np.abs(prev_training_loss - training_loss) < 0.000001:
print(W_, b_)
break
prev_training_loss = training_loss # Plot of the result
plt.scatter(xs, ys)
plt.plot(xs, Y_pred.eval(feed_dict={X: xs}, session=sess))
And we have it! With this regression line we will be able to predict the intelligence of every Simpson’s character knowing the age.
MNIST Dataset
Let’s see now how to classify digits images with a logistic regression. We will use the “Hello world” of the Deep Learning datasets.
Let’s import the relevant libraries and the dataset MNIST:
import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data
We load the dataset by encoding the labels with one-hot encoding (it converts each label into a vector of length = N_CLASSES, with all 0s except for the index that indicates the class to which the image belongs, which contains a 1). For example, if we have 10 classes (numbers from 0 to 9), and the label belongs to number 5: label = [0 0 0 0 1 0 0 0 0].
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)print("Train examples: {}".format(mnist.train.num_examples))
print("Test examples: {}".format(mnist.test.num_examples))
print("Validation examples: {}".format(mnist.validation.num_examples))# Images are stored in a 2D tensor: images_number x image_pixels_vector
# Labels are stored in a 2D tensor: images_number x classes_number (one-hot)
print("Images Size train: {}".format(mnist.train.images.shape))
print("Images Size train: {}".format(mnist.train.labels.shape))# To see the range of the images values
print("Min value: {}".format(np.min(mnist.train.images)))
print("Max value: {}".format(np.max(mnist.train.images)))# To see some images we will acess a vector of the dataset and resize it to 28x28
plt.subplot(131)
plt.imshow(np.reshape(mnist.train.images[0, :], (28, 28)), cmap='gray')
plt.subplot(132)
plt.imshow(np.reshape(mnist.train.images[27500, :], (28, 28)), cmap='gray')
plt.subplot(133)
plt.imshow(np.reshape(mnist.train.images[54999, :], (28, 28)), cmap='gray')
We have already seen a little of what the MNIST dataset consists of. Now, let’s create our regressor:
First, we create the placeholder for our input data. In this case, the input is going to be a set of vectors of size 768 (we are going to pass several images at once to our regressor, this way, when it calculates the gradient it will be swept in several images, so the estimation will be more precise than if it used only one)
n_input = 784 # Number of data features: number of pixels of the image
n_output = 10 # Number of classes: from 0 to 9
net_input = tf.placeholder(tf.float32, [None, n_input]) # We create the placeholder
Let’s define now the regression equation: y = W*x + b
W = tf.Variable(tf.zeros([n_input, n_output])) b = tf.Variable(tf.zeros([n_output]))
As the output is multiclass, we need a function that returns the probabilities of an image belonging to each of the possible classes. For example, if we put an image with a 5, a possible output would be: [0.05 0.05 0.05 0.05 0.55 0.05 0.05 0.05 0.05] whose sum of probabilities is 1, and the class with the highest probability is 5.
We apply the softmax function to normalize the output probabilities:
net_output = tf.nn.softmax(tf.matmul(net_input, W) + b)
SoftMax Function
# We also need a placeholder for the image label, with which we will compare our prediction And finally, we define our loss function: cross entropy
y_true = tf.placeholder(tf.float32, [None, n_output])# We check if our prediction matches the label
cross_entropy = -tf.reduce_sum(y_true * tf.log(net_output))
idx_prediction = tf.argmax(net_output, 1)
idx_label = tf.argmax(y_true, 1)
correct_prediction = tf.equal(idx_prediction, idx_label)# We define our measure of accuracy as the number of hits in relation to the number of predicted samples
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))# We now indicate that we want to minimize our loss function (the cross entropy) by using the gradient descent algorithm and with a rate of learning = 0.01.
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
Everything is now set up! Let’s execute the graph:
from IPython.display import clear_outputwith tf.Session() as sess: sess.run(tf.global_variables_initializer()) # Let's train the regressor
batch_size = 10
for sample_i in range(mnist.train.num_examples):
sample_x, sample_y = mnist.train.next_batch(batch_size)
sess.run(optimizer, feed_dict={net_input: sample_x,
y_true: sample_y}) # Let's check how is performing the regressor
if sample_i < 50 or sample_i % 200 == 0:
val_acc = sess.run(accuracy, feed_dict={net_input: mnist.validation.images, y_true: mnist.validation.labels})
print("({}/{}) Acc: {}".format(sample_i, mnist.train.num_examples, val_acc))# Let's show the final accuracy
print('Teste accuracy: ', sess.run(accuracy, feed_dict={net_input: mnist.test.images, y_true: mnist.test.labels}))
We have just trained our first NEURONAL NETWORK with TensorFlow!
Think a little bit about what we just did.
We have implemented a logistic regression, with this formula: y = G(Wx + b), where G = softmax() instead of the typical G = sigmoid().
If you look at the following image, which defines the perceptron (a single-layer neural network) you can see as output = Activation_function(Wx). You see? Only the bias is missing! And notice that the input is a 1? So the weight w0 is not multiplied by anything. Exactly! The weight w0 is the bias, which appears with this notation simply to be able to implement it as a matrix multiplication.
So, what we have just implemented is a perceptron, with
- batch_size = 10
- 1 epoch
- descent gradient as optimizer
- and softmax as activation function.
Final Words
As always, I hope you enjoyed the post, that you have learned how to use TensorFlow to solve linear problems and that you have succesfully trained your first Neural Network!
If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here .
If you want to learn more about Machine Learning and Artificial Intelligence follow me on Medium , and stay tuned for my next posts!
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
"笨办法"学Python
肖 (Zed A.Shaw) / 王巍巍 / 人民邮电出版社 / 2014-11-1 / CNY 49.00
本书是一本Python入门书籍,适合对计算机了解不多,没有学过编程,但对编程感兴趣的读者学习使用。这本书以习题的方式引导读者一步一步学习编程,从简单的打印一直讲到完整项目的实现,让初学者从基础的编程技术入手,最终体验到软件开发的基本过程。 本书结构非常简单,共包括52个习题,其中26个覆盖了输入/输出、变量和函数三个主题,另外26个覆盖了一些比较高级的话题,如条件判断、循环、类和对象、代码测......一起来看看 《"笨办法"学Python》 这本书的介绍吧!