内容简介:If you, just like me, are anything like a normal person, you probably have experienced how sometimes you get so caught up in the development of your application that it is hard to find a moment to stop and think if we are doing things the most efficient wa
The keras mode, the eager mode and the graph mode
If you, just like me, are anything like a normal person, you probably have experienced how sometimes you get so caught up in the development of your application that it is hard to find a moment to stop and think if we are doing things the most efficient way we can do: are we using the right tools? which framework suits best my use case? is this approach extensible? do we have in mind the scalability?
This is specially true in the AI field. We all know the AI is a rapidly moving field. New research is published by the day. There is a huge fight between major AI frameworks that are being developed at a high pace. New hardware architectures, chips and optimizations are released to support the deployment of the growing AI adoption… However, despite all bells and whistles, sometimes you need to stop and reconsider.
When is it a good moment to stop and reconsider? That you will only know. For me this moment has come very recently. I have been using Keras and Tensorflow 1.x (TF1) both at work and for my personal projects since I started in this field. I am completely in love with the high level approach of the Keras library and the lower level approach of Tensorlfow that lets you change things under the hood when you need more customization.
Although I have been a huge fan of this Keras-Tensorflow marriage there always have been a very specific downside that set this couple far from idyllic: the debugging features . As you already know, in Tensorflow, there is this paradigm of defining the computational graph first, compile it after (or move it to GPU) and then run it very efficiently. This paradigm is very nice and makes sense technically speaking, but, once you have the model in the GPU it is almost impossible to debug it.
This is why after a while and coinciding that it has been roughly a year since TensorFlow 2.0 was published in its alpha version I decided to take a shoot at TensorFlow 2.1 (I could have started with TF2.0, but we all know we love new software) and share with you how it went.
TensorFLow 2.1
The sad truth is that I had a hard time figuring out how I was supposed to use this new TensorFlow version, the famous 2.1 stable version. I know, there are plenty of tutorials , notebooks and code gists… However I found that it was not the programming what bore the difficulty, since in the end it is just Python, but the paradigm shift. To put it simple: TensorFlow 2 programming differs from TensorFlow 1 in the same way Object Oriented programming differs from Functional programming.
After doing some experiments I found that in TensorFlow 2.1 there are 3 approaches for building models:
tf.keras tf.function
So enough of the boring! Show me code!
The Keras Mode
This is the standard usage we all are used to. Use just plain Keras with a custom loss function featuring a Squared Error loss. The network is a 3 Dense layers deep network.
# The network x = Input(shape=[20]) h = Dense(units=20, activation='relu')(x) h = Dense(units=10, activation='relu')(h) y = Dense(units=1)(h)
The objective in here is to teach a network to learn how to sum a vector of 20 elements. So we feed the network with a dataset of [10000 x 20]
, so 10000 samples with 20 features each (the elements to sum). That’s in:
# Training samples train_samples = tf.random.normal(shape=(10000, 20)) train_targets = tf.reduce_sum(train_samples, axis=-1) test_samples = tf.random.normal(shape=(100, 20)) test_targets = tf.reduce_sum(test_samples, axis=-1)
We can run this example and we get the usual nice looking Keras output:
Epoch 1/10 10000/10000 [==============================] - 10s 1ms/sample - loss: 1.6754 - val_loss: 0.0481 Epoch 2/10 10000/10000 [==============================] - 10s 981us/sample - loss: 0.0227 - val_loss: 0.0116 Epoch 3/10 10000/10000 [==============================] - 10s 971us/sample - loss: 0.0101 - val_loss: 0.0070
So what’s happening here? you might ask. Well, nothing, just a Keras toy example training at 10s per epoch (in a NVIDIA GTX 1080 Ti). What about the programming paradigm? Same as before, just like in TF1.x, you define the graph, then you run it by calling keras.models.Model.fit
. And the debugging features? Same as before… None. You can not even set a simple break point in the loss function.
After running this, you might be wondering a very obvious question: where on earth are all the nice features the TensorFlow 2 release promised? And you would be right. If the integration with the Keras package means just not having to install an additional package… what is the advantage?
On the top of that there is one, even more important question: where are the famous, by all expected, debugging features? Fortunately this is where the Eager Mode comes to rescue.
The Eager Mode
What if I told you there is a way to build your models interactively and having access to all the operations in runtime? — If you are shacking yourself in excitement it means you have suffered the deep pain of a runtime error in a random batch after 10 epochs… Yes, I know, I have been there too, we can start calling us brothers in arms after those battles.
Well, yes, this is the operation mode you were looking for. In this mode, all tensor operations are interactive, you can set a break point and get access to any of the intermediate tensor variables. However, this flexibility comes at a cost: more explicit code. Let’s take a look:
The first thing it might come to your head after reading the code could be: a lot of code just for doing a model.compile
and a model.fit
. Yes, true. But on the other hand you have the control of all what was happening under the hood before. And what was happening under the hood? The training loop.
So now things change. In this approach you can design how things are going to work from the ground up. Here are the things you can specify now:
- Metrics: ever wanted to measure results per samples, batches, or by any other custom statistic? No problem, we got you covered. Now you can use the good old moving average or any other custom metric based on whatever you want.
-
Loss function: ever wanted to make crazy multiple parameters dependent loss function? Well, this is solved too, you can get all the tricky you want in the loss function definition without Keras complaining about it with its
_standarize_user_data
( link ) - Gradients: you can access the gradients, and define the specifics of the forward and the backward pass. Yes, finally, so please, join me in a big: Hooray!
The metrics are specified with the new tf.keras.metrics
API
. You just take the metric you want, define it and use it like this:
# Getting metric instanced metric = tf.keras.metrics.Mean() # Run your model to get the loss and update the metric loss = [...] metric(loss)# Print the metric print('Training Loss: %.3f' % metric.result().numpy())
The loss function and the gradients are computed in the forward and the backward pass respectively. In this approach, the forward pass must be recorded by the tf.GradientTape
. The tf.GradientTape
will track (or tape) all the tensors operations done in the forward pass so it can compute the gradients in the backward pass. Putting it in other words: in order to run backward, you must remember the path you took forward.
# Forward pass: needs to be recorded by gradient tape with tf.GradientTape() as tape: y_pred = model(x) loss = loss_compute(y_true, y_pred)# Backward pass: gradients = tape.gradient(loss, model.trainable_weights) optimizer.apply_gradients(zip(gradients, model.trainable_weights))
This is pretty straightforward, in the forward pass you run your prediction and see how well you did it by computing a loss. In the backward pass you check how your weights affected that loss by computing the gradients and, then, try to minimize the loss by updating the weights (with the help of an optimizer).
You can also notice in the code that at the end of each epoch the validation loss is computed (by running just the forward pass without updating weights).
Well let’s see how this compares to previous approach (I have reduced the output a bit so it can fit in here):
Epoch 1: Loss: 1.310: 100%|███████████| 10000/10000 [00:41<00:00, 239.70it/s] Epoch 2: Loss: 0.018: 100%|███████████| 10000/10000 [00:41<00:00, 240.21it/s] Epoch 3: Loss: 0.010: 100%|███████████| 10000/10000 [00:41<00:00, 239.28it/s]
What happened? Have you noticed? It took 41s per epoch on the same machine, that is 4x time increment… And this is just a dummy model. Can you imagine how this can scale up for a real use case model such us RetinaNet, YOLO or MaskRCNN?
Luckily, the nice TensorFlow guys were aware of this, and implemented the graph mode.
The Graph Mode
The graph mode (from AutoGraph or tf.function
) is sort of a mixed mode between the two previous. You can get a sense of what this is in here
and here
. But I found those guides to be a bit confusing so I am explaining it in my own words.
If the Keras mode was about defining the graph and running it in GPU later, and the eager mode was about executing each step interactively, the graph mode lets you code as if you were in eager mode but run the training almost as fast as if you were in Keras mode (so yes, in the GPU).
The only change with regard to the eager mode is that in the graph mode you break up the code into small functions and annotate those functions with the @tf.function
. Let’s take a look to see how things changed:
Now you see how the forward and backward pass computations have been refactored into 2 functions that have been annotated with the @tf.function
decorator.
So what is really happening here? Easy. Whenever you annotate a function with the @tf.function
decorator, you are “compiling” those operations into the GPU the same way Keras does. So by annotating your functions you tell TensorFlow to run those operation in a optimized graph in the GPU.
Under the hood, what is really happening is that the function is being parsed by AutoGraph
, tf.autograph
. AutoGraph will take the function inputs and outputs and generate a TensorFlow graph from them, meaning, it will parse the operations to get the outputs from the inputs into a TensorFlow graph. This generated graph will be run very efficiently into the GPU.
This is why it is sort of a mixed mode, because all the operations are run interactively except the operations annotated with the @tf.function
decorator.
This also means that you will have access to all the variables and tensors except the ones within the functions decorated with @tf.function
, of which you will only have access to its inputs and outputs. This approach establishes a very clear way of debugging in which you can start developing interactively in eager mode and, then, when your model is ready, push it to production performance with @tf.function
. Sounds good right? Let’s see how it goes:
Epoch 1: Loss: 1.438: 100%|████████████| 10000/10000 [00:16<00:00, 612.3it/s] Epoch 2: Loss: 0.015: 100%|████████████| 10000/10000 [00:16<00:00, 615.0it/s] Epoch 3: Loss: 0.009: 72%|████████████| 7219/10000 [00:11<00:04, 635.1it/s]
Well, an amazing 16 s/epoch. You might thing that it is not as fast as the Keras mode but, on the other hand, you get all the debugging features and a very close performance.
Conclusions
If you have been following all the article it won’t come as a surprise to you that in the end all this sums up to the very old software problem: flexibility or efficiency? Eager mode or Keras mode? Well, why settle? Use the graph mode!
In my view the TensorFlow guys have done an excellent work into providing more flexibility for us, developers, without compromising too much in the efficiency. So from what I stand I can only say bravo for them.
以上所述就是小编给大家介绍的《TensorFlow 2.1: A How-To》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
JAVA核心技术卷2
Cay S. Horstmann、Gary Cornell / 陈昊鹏、王浩、姚建平 / 机械工业出版社 / 2008-12 / 118.00元
《JAVA核心技术卷2:高级特征》是Java技术权威指南,全面覆盖Java技术的高级主题,包括流与文件、XML、网络、数据库编程、高级Swing、高级 AWT、JavaBean构件、安全、分布式对象、脚本、编译与注解处理等,同时涉及本地化、国际化以及Java SE 6的内容。《JAVA核心技术卷Ⅱ:高级特征》对Java技术的阐述精确到位,叙述方式深入浅出,并包含大量示例,从而帮助读者充分理解Jav......一起来看看 《JAVA核心技术卷2》 这本书的介绍吧!