内容简介:This article is not just about Machine Learning and Object Detection, it’s about Elixir interoperability and how we can take advantage of the Python’s fantastic set of ML libraries, bringing their features into the Elixir world.We see how to bringWe start
This article is not just about Machine Learning and Object Detection, it’s about Elixir interoperability and how we can take advantage of the Python’s fantastic set of ML libraries, bringing their features into the Elixir world.
We see how to bring YOLO , a state-of-the-art real-time object detection system, in a Phoenix web app.
We start with Python, by building a small app which does the actual object detection. Then we focus on the Elixir-Python interoperability, building an Elixir wrapper around the Python app, using Ports .
The second part of the article is all about using our YOLO Elixir module in Phoenix, at first detecting objects on single images and then doing real-time object detection using the computer’s webcam.
In the poeticoding/yolo_example GitHub repo you find all the code we see here, both the Phoenix examples and the object detection Python script.
Table of Contents
- Making Elixir and Python work together
- YOLO Object Detection in Python
-
detect.py
-
yolo
-
Yolo.Worker
- Detect Objects in Uploaded Images
- Object Detection with a Webcam
- Frontend – Phoenix Channel, Webcamjs and
canvas
- Backend – WebcamChannel
- Drop frames and dynamically adapt
- Frontend – Phoenix Channel, Webcamjs and
Making Elixir and Python work together
We are not going to implement the YOLO algorithm ourself, that’s for sure! It would not make any sense since there are great easy-to-use Python libraries that implement YOLOv3 for us.
cvlib it’s a high level library that runs object detection with just a few lines of code; it uses OpenCV and TensorFlow under the hood. We don’t even need to train a model our-self: cvlib uses a model pre-trained on the COCO dataset , capable of detecting 80 common objects.
But how can we take advantage of this Python library, letting Elixir talk with Python?
The simplest way would be to use System.cmd : we run our Python object detection script, passing the image path as an argument and waiting that the program exits returning the result. Unfortunately this is too slow: before detecting the objects our Python code needs to load the libraries and the YOLOv3 model in memory, which could take few seconds (on my laptop it takes around 2 seconds).
Since we can’t wait to load the model for each detection, we are going to use Port to run our Python app as a long running process (operating system process, external to the Erlang VM) which holds the model in memory and communicates with Elixir via stdin
/ stdout
.
Python receives the data via stdin
and sends back the result writing it to stdout
. Everything written to stdout
is sent to the Elixir process’ mailbox (the one that opened the Port). We’ll see in detail how to use Ports to build our Elixir wrapper, but if you never used Ports, Outside Elixir (written by Saša Jurić) it’s a great in-depth read!
There are also other ways to handle the Elixir – Python interoperability. We could, for example, use Pyrlang to run a Python node as part of our Elixir cluster. Or we could run the Python app with a HTTP server like Flask , letting Elixir and Python communicate via HTTP. Each one has its own pros and cons.
I preferred to go with Ports because it’s really easy to detect crashes and it’s a solution that works seamlessly on my computer, on a server or on an embedded device (like the Nvidia Jetson Nano or Raspberry Pi).
YOLO Object Detection in Python
Let’s start easy, with a really simple Python script that processes only one image. It starts by loading cvlib and the YOLOv3 model, then detects the objects present in the dog.jpg image.
First, we need to create a new Python virtual environment and install the OpenCV , TensorFlow and cvlib . Anaconda makes easy to create a new python virtual environment. With the conda
command we create a new python3.6 environment called yolo .
$ conda create -n yolo python=3.6
Once created, we need to activate the new environment and install OpenCV , TensorFlow with conda
and cvlib with pip
.
$ conda activate yolo $ conda install tensorflow opencv $ pip install cvlib
# detect.py import cv2 import cvlib as cv img = cv2.imread("dog.jpg") boxes, labels, _conf = cv.detect_common_objects(img, model="yolov3") print(labels, boxes)
This script is really simple, it imports cv2
( OpenCV ) and cvlib
, then it loads the dog.jpg
image (you see above) into memory and passes it to cv.detect_common_objects
function, using the YOLOv3 model. It prints the detected objects’ labels and bounding boxes at the end.
$ python detect.py Using TensorFlow backend. Downloading yolov3.cfg from https://github.com/arunponnusamy/object-detection-opencv/raw/master/yolov3.cfg Downloading yolov3.weights from https://pjreddie.com/media/files/yolov3.weights Downloading yolov3_classes.txt from https://github.com/arunponnusamy/object-detection-opencv/raw/master/yolov3.txt ['dog', 'bicycle', 'truck'] [[122, 223, 320, 543], [117, 124, 569, 432], [472, 86, 692, 166]]
Fantastic! With just a few lines of code we are able to detect objects in an image! The script tells us the are a dog , a bicycle and a truck and where they are located.
The first time you run the script, cvlib downloads three files for us ( yolov3.cfg, yolov3.weights and yolov3_classes.txt ) which are used to load the YOLOv3 model.
What about speed?
$ time python detect.py ... real 0m2.252s user 0m3.176s sys 0m0.638s
On my MacBook Pro 2018 (with i9) it takes more than 2 seconds… too much if we want to detect objects in real-time. But most of this time is spent loading the model, the detection itself is around 0.2s.
# detect.py ... import time start = time.time() boxes, labels, _conf = cv.detect_common_objects(img, model="yolov3") print("first detection: ", time.time() - start) start = time.time() boxes, labels, _conf = cv.detect_common_objects(img, model="yolov3") print("second detection: ", time.time() - start)
$ python detect.py Using TensorFlow backend. first detection: 0.63 second detection: 0.21
Loading cv2
and cvlib
libraries takes around 1.4s and the first time we call cv.detect_common_objects(img, model="yolov3")
cvlib takes 0.63s, since cvlib needs to load the model in memory, but the second time is much faster (0.21s). That’s why we can’t run this script with System.cmd
for each detection and why we need a long-running process which keeps the model in memory!
0.21s means that the best I can get from my laptop (MacBook Pro 15 2018 with a 6 cores 2.9GHz i9 ) is around 4 detections per second. Can we do any better? Definitely! But with a GPU. Using an Nvidia GTX 1080 we should reach 0.03s (30ms) per detection (check TensorFlow GPU ).
We can also use Darknet , a Neural Network Framework written in C and CUDA. When just running on CPU the OpenCV implementation is faster than Darknet; but Darknet really shines when compiled with CUDA running on a GPU!
I did many benchmarks both locally and on the cloud. The fastest my computer could process the dog.jpg image is ~0.2s. On the cloud I’ve tried to run YOLO on both CPU and GPU: on AWS, to reach 0.2s per image, I needed a C5.4xlarge (which costs $0.68/hour). But I’ve got the most interesting result with a P3 instance (an expensive one with the Nvidia Tesla GPU!) and Darknet , processing an image in just 0.03s!
I’ve bought an Nvidia Jetson Nano , a small computer with a 128-core Nvidia GPU that runs with only 10W. My idea is to install Elixir on it, compile Darknet with CUDA and write a NIF for Yolo object detection (more on this in the coming weeks, stay tuned!)
To make our object detection faster , at the expense of accuracy , we can use a smaller model called tiny YOLO ( yolov3-tiny
).
boxes, labels, _conf = cv.detect_common_objects( img, model="yolov3-tiny" )
$ python detect.py ... detection after warmup: 0.032 ['dog', 'car'] [[124, 218, 382, 518], [466, 82, 686, 172]]
Then tiny version is 8 times faster on my laptop (only 32ms to process dog.jpg ), but it’s also less accurate : it doesn’t detect the bicycle and the truck is now a car .
Elixir Ports
Let’s see first a simple example on how to use Port to communicate with a Python script.
In this example Elixir sends to Python a string with a list of numbers. The Python script converts this string into a list of integers, sums the numbers and sends back the result to Elixir.
Each message, sent from both sides, is a string ending with a newline – in this way is really easy to distinguish different messages because they are just separate lines. However, we’ll see later that when sending images we can’t rely on newlines as separator and we’ll have to find a different approach.
# python_scripts/add.py import sys for line in sys.stdin: # expecting line in for of "num,num\n" line = line.strip() # EOF if line == "": break # strings to ints, and sum values = line.split(",") nums = map(int, values) result = sum(nums) # send the result via stdout sys.stdout.write(str(result) + "\n") sys.stdout.flush()
This python script reads a line from stdin
, it strips whitespaces and newlines, it splits the string and converts the elements into integers, summing them and writing the result in form of a string on the stdout
.
The Elixir process will then receive the result as a message in the mailbox.
Ok, let’s now use Port.open/2 to run the add.py
python script and Port.command/3 to send data to it.
iex> port = Port.open({:spawn, "python add.py"}, [:binary]) #Port<0.5> iex> Port.command(port, "2,5\n") true iex> flush() {#Port<0.5>, {:data, "7\n"}}
We see how by sending the "2,5\n"
string to the stdin
of the python application via port
, we receive a message with the result in the process mailbox.
Let’s write a function that does everything for us: encodes the list of integers into a string, sends the message to the python script, waits for the result message and returns it as an integer.
add = fn port, nums -> # integers to a string msg = nums |> Enum.map(&to_string/1) |> Enum.join(",") # sending the msg and ending "\n" as iolist Port.command(port, [msg, "\n"]) # receive the result and convert to a string receive do {^port, {:data, result}} -> String.trim(result) |> String.to_integer() end end iex> add.(port, [1,2,3,4,5]) 15
Ports and detect.py
The goal of this part is to write a detect.py
Python script that receives images from Elixir and sends back the result of the detection.
The idea is similar to what we’ve seen in the previous example: Elixir sends an image to the python script through a Port, on the other side the Python script reads the image from stdin
, runs object detection and sends the result to Elixir writing to stdout
.
Using strings with ports is an simple and quick solution, but we now need to send images and we can’t rely anymore on newlines as separator between messages. An easy way to get the job done would be to encode the image to a base64 string, but this would add a 33% overhead in the message size, plus encoding/decoding steps.
To understand when the message terminates we prepend a 4 bytes header to each message. In this header we put the message’s size encoded as an unsigned big-endian integer.
Along with the image, we send an image id as well, which is useful to keep track of multiple images sent asynchronously to the Python process. In this way, when Elixir receives an object detection result, it knows to which image the result refers to.
A variable length image id would require to send its size. For simplicity we make it fixed using a 16-bytes UUID4 (we can use uuid library to generate UUID4 ids)
On Elixir, by opening a port with the {:packet, 4}
option, we don’t have to think about adding and reading the message’s size.
port = Port.open( {:spawn, "python3 detect.py"}, [:binary, {:packet, 4}] )
When sending data to Python (using Port.command
), the message’s size is automatically prepended; when reading data from the Python stdout
, the Port automatically reads the first 4-bytes to understand the message’s size.
At the moment we are using stdin
and stdout
, which isn’t great since any of the libraries could write on stdout
. We’d also like to use stdout
ourself to print debugging messages on the terminal.
By adding the :nouse_stdio
to the Port.open
options we ask Port to use file descriptor 3
(instead of stdin
) and 4
(instead of stdout
). So, the Port.open
code becomes:
port = Port.open( {:spawn, "python3 detect.py"}, [:binary, :nouse_stdio, {:packet, 4}] )
On the Python side, we open the file descriptors 3
and 4
in binary mode with os.fdopen
. We define a setup_io()
function, which returns a tuple with two opened file objects connected to the file descriptors.
# detect.py import os # setup of FD 3 for input (instead of stdin) # FD 4 for output (instead of stdout) def setup_io(): return os.fdopen(3,"rb"), os.fdopen(4,"wb")
To read the first 4-bytes (unsigned int big-endian) and get an int
, we use unpack
with "!I"
option ( !
is for big-endian bytes-order and I
for 4-bytes unsigned integer).
# detect.py import numpy as np import cv2, sys from struct import unpack, pack UUID4_SIZE = 16 def read_message(input_f): # reading the first 4 bytes with the length of the data # the first 1 byte is for the model # the other 32 bytes are the UUID string, # the rest is the image header = input_f.read(4) if len(header) != 4: return None # EOF (total_msg_size,) = unpack("!I", header) # image id image_id = input_f.read(UUID4_SIZE) # read image data image_data = input_f.read(total_msg_size - UUID4_SIZE) # converting the binary to a opencv image nparr = np.fromstring(image_data, np.uint8) image = cv2.imdecode(nparr, cv2.IMREAD_COLOR) return {'id': image_id, 'model': model, 'image': image}
The read_message()
function reads the first 4 bytes from the input_f
(the input file object connected to the file descriptor 3
) and unpack("!I", header)
decodes the header to the total_msg_size
integer. It then reads
16 total_msg_size - 16
At the end, the function converts the received image to a ready to use OpenCV image and returns a dictionary with image
and id
. When the header
size is less than 4 bytes, it returns None
instead. This happens when the Port is closed: a port doesn’t kill the Python process, but instead, it closes the input ( stdin or 3 ) and output ( stdout or 4 ) file descriptors.
Then we write the detect(image, model)
function, which detects the objects using cvlib
and returns a tuple.
# detect.py import cvlib def detect(image, model): boxes, labels, _conf = cv.detect_common_objects(image, model=model) return boxes, labels
The first argument is an OpenCV image and the second is the model name (like "yolov3"
or "yolov3-tiny"
).
We just need now a write_result
function with output_f
, image_id
, image_shape
, boxes
and labels
arguments.
# detect.py import json from struct import unpack, pack def write_result(output, image_id, shape, boxes, labels): result = json.dumps({ 'shape': shape, 'boxes': boxes, 'labels': labels }).encode("ascii") total_msg_size = len(result) + UUID4_SIZE header = pack("!I", total_msg_size) output.write(header) output.write(image_id) output.write(result) output.flush()
We encode the result into a json
string and we write to output_f
the message’s total size total_msg_size
, which is the result
string len
+ 16 (the UUID4 size).
pack("!I", total_msg_size)
converts the integer into a 4-bytes header
. We then write the image_id
and the result
.
We now have everything we need to write the detect.py
script mainloop !
# detect.py # def read_msg ... # def detect ... # def write_result ... def run(model): input_f, output_f = setup_io() while True: msg = read_message(input_f) if msg is None: break #image shape height, width, _ = msg["image"].shape shape = {'width': width, 'height': height} #detect object boxes, labels = detect(msg["image"], model) #send result back to elixir write_result(output_f, msg["id"], shape, boxes, labels) if __name__ == "__main__": model = "yolov3" if len(sys.argv) > 1: model = sys.argv[1] run(model)
At the end of the script we call deal with the arguments and run(model)
. By default detect.py
runs the full yolov3
model. By passing a different model name as the script’s argument we can load the yolov3-tiny
$ python3 detect.py yolov3-tiny Using TensorFlow backend.
You can find the full script here.
Let’s now open a port running the detect.py
script and detect objects on dog.jpg
image
iex> port = Port.open({:spawn, "python3 detect.py"}, [:binary, {:packet, 4}]) #Port<0.5> iex> id = :crypto.strong_rand_bytes(16) <<225, 211, 65, 208, ...>> iex> image = File.read!("dog.jpg") iex> Port.command(port, [id, image]) true iex> flush {#Port<0.5>, {:data, <<225, 211, 65, 208, 60, ...>>}}
To get a random 16-bytes image id we’ve simply used :crypto.strong_rand_bytes(16)
(we’ll later use uuid to get a UUID4). Then, we send id
and image
binary as an iolist
. Using flush
, we see that we’ve received a message from port
.
Let’s use pattern matching to extract the image_id
and the result’s json string.
iex> Port.command(port, [id, image]) true iex> receive do ...>{^port, {:data, <<image_id::binary-size(16), json_string::binary()>>}} -> ...> {image_id, json_string} ...> end {<<225, 211, 65, 208, ...>>, "{\"labels\": [\"dog\", \"bicycle\", \"truck\"], \"shape\": {\"width\": 768, \"height\": 576}, \"boxes\": [[123, 222, 319, 544], [118, 124, 568, 432], [473, 86, 691, 166]]}"}
Yolo Phoenix app
Let’s now create a yolo
Phoenix project in which we’ll write the rest of the code. You find the full code with the examples on the poeticoding/yolo_examples GitHub repo.
Since we don’t need a database, we pass the --no-ecto
option
$ mix phx.new yolo --no-ecto
We then add the uuid library to the dependencies in mix.exs
# mix.exs def deps do [ ... {:uuid, "~> 1.1"}, ] end
and run mix deps.get
.
Yolo.Worker
GenServer
poeticoding/yolo_example/lib/yolo/worker.ex
It brings many advantages to build a Yolo.Worker
module that implements the GenServer behaviour and wraps our Port : it becomes easy to supervise the process, we can hide the complexity behind a simple interface and we can easily spawn a pool of Yolo workers.
Yolo.Worker
should handle multiple asynchronous requests from different processes, while taking care of the communication with detect.py
via Port.
In the diagram above, both #PID<0.110.0>
and #PID<0.112.0>
processes send an image to Yolo.Worker
. When the Yolo.Worker
process receives the image with id <<id_1>>
from #PID<0.110.0>
, it forwards this request to the Python process. While waiting for a result, Yolo.Worker
can accept new requests and keeps record of all the pending requests (image ids and requesting process pid
s). In this way Yolo.Worker
knows to which process it has to forward the result once received – In the example above, once received the result with image id <<id_1>>
, Yolo.Worker
forwards it to #PID<0.110.0>
.
start_link
, init
and config
Let’s start by writing the module’s start_link/1
and init/1
functions.
# lib/yolo/worker.ex defmodule Yolo.Worker do use GenServer def start_link(opts \\ []) do GenServer.start_link(__MODULE__, :ok, opts) end def init(:ok) do config = config() port = Port.open( {:spawn_executable, config.python}, [:binary, :nouse_stdio, {:packet, 4}, args: [config.detect_script, config.model] ]) {:ok, %{port: port, requests: %{}}} end ... end
We start the Port with :spawn_executable
instead of :spawn
. With :spawn
we were passing the full shell command, while with :spawn_executable
we need to pass the full path python
executable – all the arguments ( detect_script
and model
) are passed as an :args
option.
But before starting the port, init(:ok)
loads the worker configuration to get model name, python executable and detect.py
script full paths . We set the configuration in config/dev.exs
# config/dev.exs ... config :yolo, Yolo.Worker, python: "/opt/anaconda3/envs/yolo/bin/python", detect_script: "/Users/alvise/yolo/python_scripts/detect.py", model: {:system, "YOLO_MODEL"}
and config/0
loads the configuration, getting the :model
value from the YOLO_MODEL
environment variable.
# lib/yolo/worker.ex @default_config [ python: "python", detect_script: "python_scripts/detect.py", model: "yolov3" ] def config do @default_config |> Keyword.merge(Application.get_env(:yolo, __MODULE__, [])) #loads the values from env variables when {:system, env_var_name} |> Enum.map(fn # it finds the full path when not provided {:python, path} -> {:python, System.find_executable(path)} # it loads the value from the environment variable # when the env variable is not set, it defaults to @default_config[option] {option, {:system, env_variable}} -> {option, System.get_env(env_variable, @default_config[option])} # all the other options config -> config end) |> Enum.into(%{}) end
In my case, the yolo
env’s python
executable is at /opt/anaconda3/envs/yolo/bin/python
; when using this full path we don’t need to load the yolo
anaconda’s environment, like we did before with conda activate yolo
. I’ve also placed detect.py
into the python_scripts
directory of the yolo Phoenix app.
When the YOLO_MODEL
environment variable isn’t set it defaults to "yolov3"
.
init(:ok)
then returns a state with the opened port and an empty requests
Map which we’ll use to keep track of the pending requests.
Request a detection
The function we call to request an object detection is request_detection/3
# lib/yolo/worker.ex def request_detection(pid, image) do image_id = UUID.uuid4() |> UUID.string_to_binary!() request_detection(pid, image_id, image) end @uuid4_size 16 def request_detection(pid, image_id, image) when byte_size(image_id) == @uuid4_size do GenServer.call(pid, {:detect, image_id, image}) end
request_detection/3
needs the pid
of Yolo.Worker
GenServer, a 16-bytes image_id
and the image
data. It also checks the size of the image id.
The function makes a GenServer.call
: it sends {:detect, image_id, image}
to the GenServer and waits for the reply. The detection itself is asynchronous, the Yolo.Worker
process doesn’t wait for the result from of the detection from Python – instead it returns the image_id
. I preferred to use call
, instead of a cast
, to have a confirmation that the Yolo.Worker
GenServer received the request.
In case we do not provide an image_id
ourself, request_detection/2
generates a UUID4 image_id for us.
# lib/yolo/worker.ex def handle_call({:detect, image_id, image_data}, {from_pid, _}, worker) do Port.command(worker.port, [image_id, image_data]) worker = put_in(worker, [:requests, image_id], from_pid) {:reply, image_id, worker} end
The handle_call/3
callback is pretty simple. Once Yolo.Worker
receives a :detect
request, it sends image_id
and image_data
to the port
, which is held in the worker
map (the process state). To keep track of the pending detection, the image_id
is set as a key of the worker.requests
map with from_pid
value.
Handling the result
When detect.py
has processed the image and sent the result, the port sends a message to the Yolo.Worker
process. This message is handled by the handle_info/2
callback.
# lib/yolo/worker.ex def handle_info({port, {:data, <<image_id::binary-size(@uuid4_size),json_string::binary()>>}}, %{port: port}=worker) do result = get_result!(json_string) # getting from pid and removing the request from the map {from_pid, worker} = pop_in(worker, [:requests, image_id]) # sending the result map to from_pid send(from_pid, {:detected, image_id, result}) {:noreply, worker} end defp get_result!(json_string) do result = Jason.decode!(json_string) %{ shape: %{width: result["shape"]["width"], height: result["shape"]["height"]}, objects: get_objects(result["labels"], result["boxes"]) } end
If you remember, the type of message that port sends is {#Port<...>, {:data, <<...>>}}
. We pattern match the message, making sure that the port
is the one in the process’ state; we also extract the image_id
and json_string
with
<<image_id::binary-size(@uuid4_size),json_string::binary()>>
where @uuid4_size
is 16
.
Instead of just decoding the JSON string to a Map, with get_result(json_string)
we build a Map with :shape
and a detected :objects
list. The :objects
list is generated by get_objects/2
, which we see in a moment.
handle_info/2
then pops from_pid
from worker.requests
map and sends to from_pid
the result
. It returns an updated worker
state at the end.
# lib/yolo/worker.ex def handle_info(...) do result = get_result!(json_string) # get from_pid and removing the request from the map {from_pid, worker} = pop_in(worker, [:requests, image_id]) # send the result map to from_pid send(from_pid, {:detected, image_id, result}) {:noreply, worker} end
get_objects(labels, boxes)
In the JSON string we don’t have a list of objects, we just have a two separate lists, labels
and boxes
. The first box in boxes
refers to the first label in labels
and so on…
labels = ["dog", "bicycle", "truck"] boxes = [[122, 224, 320, 542], [118, 124, 568, 432], [473, 86, 691, 166]]
Each box element is a bounding-box: top-left and bottom-right x,y coordinates.
get_objects(labels, boxes)
transforms the two lists into an object list where each object is a map with a :label
and :x
, :y
top-left coordinates, :w
(for width ) and :h
(for height ) of the bounding box.
# lib/yolo/worker.ex def get_objects(labels, boxes) do Enum.zip(labels, boxes) |> Enum.map(fn {label, [x, y, bottom_right_x, bottom_right_y]}-> w = bottom_right_x - x h = bottom_right_y - y %{label: label, x: x, y: y, w: w, h: h} end) end
iex> Yolo.Worker.get_objects( ...> ["dog", "bicycle", "truck"], ...> [[122, 224, 320, 542], [118, 124, 568, 432], [473, 86, 691, 166]]) [ %{h: 318, label: "dog", w: 198, x: 122, y: 224}, %{h: 308, label: "bicycle", w: 450, x: 118, y: 124}, %{h: 80, label: "truck", w: 218, x: 473, y: 86} ]
Try it on iex
Let’s try Yolo.Worker
on iex
!
iex> {:ok, worker_pid} = Yolo.Worker.start_link([]) {:ok, #PID<0.304.0>} iex> image = File.read!("dog.jpg") <<255, 216, 255, 225, ...>> iex> image_id = Yolo.Worker.request_detection(worker_pid, image) <<3, 76, 254, 221, ...>> iex> flush {:detected, <<3, 76, 254, 221, ...>>, %{ objects: [ %{h: 318, label: "dog", w: 198, x: 122, y: 224}, %{h: 308, label: "bicycle", w: 450, x: 118, y: 124}, %{h: 80, label: "truck", w: 218, x: 473, y: 86} ], shape: %{height: 576, width: 768} }}
Great, it works! :tada:
await/2
It’s useful to have an await(image_id, timeout)
function that awaits a :detected
message and returns the result – Let’s add it to the Yolo.Worker
module.
# lib/yolo/worker.ex def await(image_id, timeout \\ 5_000) do receive do {:detected, ^image_id, result} -> result after timeout -> {:timeout, image_id} end end
iex> worker_pid \ ...> |> Yolo.Worker.request_detection(image) \ ...> |> Yolo.Worker.await() %{ objects: [...], shape: %{...} }
Supervised Yolo.Worker
We can easily make Yolo.Worker
supervised, adding it as a child of the application Supervisor, in Yolo.Application.start/2
. Passing the [name: Yolo.Worker]
option, the process is registered locally with the given name. In this way we can just use the Yolo.Worker
name instead of the pid – this is pretty useful when the pid can change due to crashes or Supervisor
restarts.
defmodule Yolo.Application do use Application def start(_type, _args) do children = [ YoloWeb.Endpoint, # one worker named Yolo.Worker {Yolo.Worker, [name: Yolo.Worker]}, ] opts = [strategy: :one_for_one, name: Yolo.Supervisor] Supervisor.start_link(children, opts) end ... end
$ iex -S mix Using TensorFlow backend iex> Yolo.Worker.request_detection(Yolo.Worker, File.read!("dog.jpg")) \ ...> |> Yolo.Worker.await() %{ objects: [...], ...}
When starting the application (a iex
session in this case ), Yolo.Worker
is started automatically by the Supervisor
( cvlib
, in python script, prints on stderr
the Using TensorFlow backend
message).
Without closing iex
, let’s take another terminal and see what happens when we kill the python process and send another detection request.
# on another terminal $ ps aux | grep -i detect.py alvise 15206 ... /opt/anaconda3/envs/yolo/bin/python python_scripts/detect.py yolov3 $ kill -9 15206
iex> Yolo.Worker.request_detection(Yolo.Worker, File.read!("dog.jpg")) |> Yolo.Worker.await() [error] GenServer Yolo.Worker terminating ** (ArgumentError) argument error :erlang.port_command(#Port<0.6>, ...) ... Using TensorFlow backend. iex> Yolo.Worker.request_detection(Yolo.Worker, File.read!("dog.jpg")) |> Yolo.Worker.await() %{ objects: [...], ...}
When we kill the python process (or it more realistically just crashes) the port
closes automatically. Then, when Yolo.Worker
tries to send data to the closed port
, by calling Port.command(port, [image_id, image])
, the Yolo.Worker
process crashes. The Supervisor
catches the crash and starts another Yolo.Worker
process, ready to serve new requests.
Detect Objects in Uploaded Images
Now that Yolo.Worker
does the heavy lifting, we can use it on a Phoenix app to detect objects in uploaded images. When a user uploads an image via a <form>
, we run object detection on the uploaded image and show labels and bounding boxes using svg .
Let’s start by creating a new YoloWeb.UploadController
module in lib/yolo_web/controllers/upload_controller.ex with new
and create
actions, an empty YoloWeb.UploadView
, and add the new routes in YoloWeb.Router
( /lib/yolo_web/router.ex) .
# lib/yolo_web/router.ex defmodule YoloWeb.Router do ... scope "/", YoloWeb do ... resources "/uploads", UploadController, only: [:new, :create] end end
# lib/yolo_web/views/upload_view.ex defmodule YoloWeb.UploadView do use YoloWeb, :view end
# lib/yolo_web/controllers/upload_controller.ex defmodule YoloWeb.UploadController do use YoloWeb, :controller def new(conn, _params) do render(conn, "new.html") end ... end
lib/yolo_web/templates/upload/new.html.eex below
<%= form_for @conn, Routes.upload_path(@conn, :create), [multipart: true], fn f-> %> <%= file_input f, :upload, class: "form-control" %> <%= submit "Detect", class: "button"%> <% end %>
The new
action in UploadController
renders a form, which simply uploads the selected image to /uploads path via HTTP POST.
If you want to take a deeper look at uploads in Phoenix, I wrote a series of articles on how to handle uploads on Phoenix, upload with Javascript and make a progress bar .
When the image is uploaded, the create
action in YoloWeb.UploadController
is called, passing a Plug.Upload
struct in params
.
# lib/yolo_web/controllers/upload_controller.ex def create(conn, %{"upload" => %Plug.Upload{}=upload}=_params) do data = File.read!(upload.path) detection = Yolo.Worker.request_detection(Yolo.Worker, data) |> Yolo.Worker.await() base64_image = base64_inline_image(data, upload.content_type) render(conn, "show.html", image: base64_image, detection: detection) end defp base64_inline_image(data, content_type) do image64 = Base.encode64(data) "data:#{content_type};base64, #{image64}" end
create/2
reads the image data from upload.path
temporary path and simply runs the detection awaiting the result.
Since I really didn’t want to use JavaScript for this example, I decided to render the final result (image, boxes and labels) using just SVG. It turned out to be much easier than playing with JavaScript Canvas.
To avoid to locally store the image (and have serve it), we can embed it in the SVG. To do so, we need to convert data
to its base64 representation with base64_inline_image/2
.
create/2
then renders the show.html
template, by passing image
and detection
.
lib/yolo_web/templates/show.html.eex
<svg width="<%= @detection.shape.width %>" height="<%= @detection.shape.height %>"> <g fill="grey" transform="scale(0.5 0.5)"> <image width="<%= @detection.shape.width %>" height="<%= @detection.shape.height %>" xlink:href="<%= @image %>"></image> <%= for o <- @detection.objects do %> <rect x="<%= o.x - 2%>" y="<%= o.y - 20%>" height="20" width="100" fill="blue"/> <text x="<%= o.x %>" y="<%= o.y %>" dy="-5" font-family="sans-serif" font-size="16px" font-weight="bold" fill="white"><%= o.label %></text> <rect x="<%= o.x %>" y="<%= o.y %>" width="<%= o.w %>" height="<%= o.h %>" /> <% end %> </g> </svg>
We render an svg
, setting its width
and height
to the original image’s shape. Inside the svg
tag, we render an <image>
with the inline base64 @image
set to the xlink:href
attribute.
Then, we enumerate the detected objects, rendering a <text>
tag for the label and <rect>
for the bounding box. To position these svg elements we simply use the object coordinates.
Fantastic, we can finally see labels and bounding boxes on an image, noticing how accurate is YOLO!
Some considerations
Can we use it in straight away in a production cloud server? As always, it depends ! YOLOv3 is fast, especially with a good Nvidia GPU just takes 30ms to detect objects in an image, but it’s an expensive computation that can easily exhaust a server’s CPU/GPU!
So, it depends by the throughput we need (number of processed images in the unit of time), the hardware or the budget we have and the accuracy we want to get.
I’ve previously talked briefly about speed; to process and image in ~0.2s we need an AWS C5.4xlarge instance, which isn’t cheap. Now, this could be more than enough in some situations or form a bottleneck in others: on an AWS C5.4xlarge instance, if we’d need to detect objects in real-time on 10-15 images per second, the requests would pile-up leading to timeouts.
We could delegate the object detection job to services like AWS Rekognition or Google Vision , which are fantastic, but they are not a silver bullet. Especially when we are just interested in running real-time object detection on an embedded device: we’d need an internet connection, each frame would suffer from the delay given by the network and we would also risk to see the cloud bill grow really fast!
Object Detection with a Webcam
Let’s make it more interesting, processing frames coming from the computer’s webcam feed! This can be useful on embedded devices with a camera, to detect objects in real-time and act accordingly.
For simplicity, in this example we are going to use a browser and HTML5 to get frames from the webcam and to render the labels and bounding boxes on the webpage.
We use JavaScript and webcamjs , on the front-end, to get 720p camera frames and send them to the Phoenix server via Channels . The channel’s process sends an asynchronous request to Yolo.Worker
with the given frame and, once received the detection’s result, it then pushes a detected
event to the browser.
Frontend – Phoenix Channel, Webcamjs and canvas
.
Let’s start with the frontend. We create a new YoloWeb.WebcamController
which simply renders lib/templates/webcam/index.html.eex
, where we have div#camera
, which is the element where we show the webcam stream, canvas#objects
where we render labels and boxes, and button#detect_button
to start and stop the detection.
<button id="start_stop">Start</button> <div> <div id="camera"></div> <canvas id="objects" width="1280" height="720"></canvas> </div>
After adding the webcamjs library in assets/package.json , we then create a new JavaScript module in assets/js/webcam.js
and import it in app.js
. In webcam.js
, we import Webcam
, connect the socket
and join the webcam:detection
channel.
// assets/js/webcam.js import Webcam from "webcamjs" import { Socket } from "phoenix" let socket = new Socket("/socket") socket.connect() // Now that you are connected, you can join channels with a topic: let channel = socket.channel("webcam:detection", {}) channel.join() .receive("ok", resp => { console.log(`Joined successfully to "webcam:detection"`, resp) }) .receive("error", resp => { console.log("Unable to join", resp) });
We then set the camera options and attach it to the #camera
element.
// assets/js/webcam.js Webcam.set({ width: 1280, height: 720, image_format: 'jpeg', jpeg_quality: 90, fps: 30 }); Webcam.attach("#camera")
We define a capture
function that takes a snapshot and sends a "frame"
event, with the base64 encoded ( data URI scheme ) frame, to the WebcamChannel
process.
// assets/js/webcam.js function capture() { Webcam.snap(function (data_uri, canvas, context) { channel.push("frame", { "frame": data_uri}) }); }
On the back-end, when an image is processed, WebcamChannel
sends to the frontend a detected
event with the detected objects. When we receive a detected
event on the front-end, draw_objects(result)
is called and it renders labels and bounding-boxes on the canvas
.
// assets/js/webcam.js //listen to "detected" events and calls draw_objects() for each event channel.on("detected", draw_objects); //our canvas element let canvas = document.getElementById('objects'); let ctx = canvas.getContext('2d'); const boxColor = "blue"; //labels font size const fontSize = 18; function draw_objects(result) { let objects = result.objects; //clear the canvas from previews rendering ctx.clearRect(0, 0, canvas.width, canvas.height); ctx.lineWidth = 4; ctx.font = `${fontSize}px Helvetica`; //for each detected object render label and box objects.forEach(function(obj) { let width = ctx.measureText(obj.label).width; // box ctx.strokeStyle = boxColor; ctx.strokeRect(obj.x, obj.y, obj.w, obj.h); // white label + background ctx.fillStyle = boxColor; ctx.fillRect(obj.x - 2, obj.y - fontSize, width + 10, fontSize); ctx.fillStyle = "white"; ctx.fillText(obj.label, obj.x, obj.y - 2); }); }
Clicking the Start/Stop button just starts and stops an interval that calls capture
every 1000/FPS
milliseconds. We start with FPS=1
(my laptop should be able to process 4 FPS with the YOLO OpenCV implementation).
// assets/js/webcam.js //toggle button starts and stops an interval const FPS = 1; // frames per second let intervalID = null; document.getElementById("start_stop") .addEventListener("click", function(){ if(intervalID == null) { intervalID = setInterval(capture, 1000/FPS); this.textContent = "Stop"; } else { clearInterval(intervalID); intervalID = null; this.textContent = "Start"; } }); export default socket //EOF
The browser requests to use the camera – once accepted, it starts showing the video in the #camera
element. The browser fails to join into the channel… we still need to to write the WebcamChannel
in the backend.
Backend – WebcamChannel
We start with a really simple implementation of YoloWeb.WebcamChannel
, making a detection request for every frame
event.
We update at first the YoloWeb.UserSocket
module (in lib/yolo_web/channels/user_socket.ex ) adding a channel route.
#lib/yolo_web/channels/user_socket.ex defmodule YoloWeb.UserSocket do use Phoenix.Socket ## Channels channel "webcam:*", YoloWeb.WebcamChannel ... end
Then, we define the YoloWeb.WebcamChannel
module in lib/yolo_web/channels/webcam_channel.ex .
#lib/yolo_web/channels/webcam_channel.ex defmodule YoloWeb.WebcamChannel do use Phoenix.Channel def join("webcam:detection", _params, socket) do {:ok, socket} end def handle_in("frame", %{"frame" => "data:image/jpeg;base64,"<> base64frame}=_event, socket) do frame = Base.decode64!(base64frame) Yolo.Worker.request_detection(Yolo.Worker, frame) {:noreply, socket} end def handle_info({:detected, _image_id, result}, socket) do push(socket, "detected", result) end end
When a frame
event is sent from the browser, handle_in/3
pattern matches the data URI extracting the base64 encoded frame. After decoding base64frame
, we send a detection request to Yolo.Worker
without awaiting the result, which would block the channel process . Instead, when Yolo.Worker
has finished to process the image and sends a {:detected, image_id, result}
message to the channel process, handle_info/2
pushes to the browser a "detected"
event with the result
.
But what happens when, for any reason, Yolo.Worker
doesn’t process fast enough all the incoming frames?
It maybe doesn’t crash, but when unable to keep up with the requests it slows down the application, piling up requests and showing the results with visible delays.
It’s easy to simulate: my computer can’t run the full YOLOv3 model at 10fps on the CPU. Just increasing the FPS
constant to 10
in webcam.js
, we see how the tracking slows down immediately with delays of seconds.
Drop frames and dynamically adapt
To avoid exhausting Yolo.Worker
, we can implement in WebcamChannel
a simple mechanism that drops the frames while Yolo.Worker
is still busy processing a previous request.
# lib/yolo_web/channels/webcam_channel.ex defmodule YoloWeb.WebcamChannel do use Phoenix.Channel def join("webcam:detection", _params, socket) do socket = socket |> assign(:current_image_id, nil) |> assign(:latest_frame, nil) {:ok, socket} end def handle_in("frame", %{"frame" => "data:image/jpeg;base64,"<> base64frame}=_event, %{assigns: %{current_image_id: image_id}}=socket) do if image_id == nil do {:noreply, detect(socket, base64frame)} else {:noreply, assign(socket, :latest_frame, base64frame)} end end # only the result of the current_image_id def handle_info({:detected, image_id, result}, %{assigns: %{current_image_id: image_id}}=socket), do: handle_detected(result, socket) # skipping results we are not waiting for def handle_info({:detected, _, _}, socket), do: {:noreply, socket} def detect(socket, b64frame) do frame = Base.decode64!(b64frame) image_id = Yolo.Worker.request_detection(Yolo.Worker, frame) socket |> assign(:current_image_id, image_id) |> assign(:latest_frame, nil) end def handle_detected(result, socket) do push(socket, "detected", result) socket = socket |> assign(:current_image_id, nil) |> detect_if_need() {:noreply, socket} end def detect_if_need(socket) do if socket.assigns.latest_frame != nil do detect(socket, socket.assigns.latest_frame) else socket end end end
When the browser joins the channel, we assign
a nil
value to current_image_id
and latest_frame
. In current_image_id
we set the id returned by Yolo.Worker.request_detection/2
and in latest_frame
we keep the latest received frame.
When the channel receives a new frame
, if current_image_id
is not nil it means that Yolo.Worker
is still processing a frame for us. So we just keep the frame in latest_frame
without making any detection request.
If current_image_id
is nil it means that we can call detect/2
which makes a detection request, assigns a new current_image_id
and returns an updated socket
.
When the result of a detection is ready, the handle_info({:detected, image_id, result}, socket)
function is called. We make sure that the result’s image_id
is equal to current_image_id
.
handle_detected/2
sends result
to the browser and sets current_image_id
to nil
. if latest_frame
isn’t nil
, we send immediately the frame to
Let’s now try to set FPS
to 20fps
, and see what happens.
We see that WebcamChannel
seems much more reactive than the previous implementation. It just processes at the Yolo.Worker
pace, skipping the rest of the frames. The server is local, so we obviously don’t suffer from any network delays! (The real fps is ~4, 0.25s per image).
It’s obviously just an initial implementation and we could add many other features. For example, if Yolo.Worker
crashes while processing a frame, WebcamChannel
will continue to drop frames waiting for a detection result – something we could solve with a detection timeout mechanism.
Tiny YOLO
Ah, wait.. but there is still the yolov3-tiny model to try with the webcam – on my computer it can run it at more than 10fps.
$ YOLO_MODEL="yolov3-tiny" mix phx.server
What’s next?
The YOLO OpenCV implementation runs much faster on the CPU than the original YOLO Darknet . But Darknet really shines when compiled with CUDA and run using a Nvidia GPU – I’m really tempted to buy an eGPU!
It’s simple and fun to use a browser and HTML5 to get webcam’s frames and to show the tracked objects. But as soon as we try to reach >= 30fps we see that this solution has toll on the overall performance. 30fps means that we have max ~30ms to send the frame and receive a result. On the browser, just making a snapshot and encoding it to base64 takes ~7ms, than to decode the base64 image to a binary is another ~5ms… when we have a 30ms restriction all these ms become precious. So, I’ll try to get camera frames using OpenCV directly.
In the next few weeks I want to try to use Darknet on the Jetson Nano I’ve just bought; it should easily reach 4fps with the full YOLOv3 (like my laptop with i9 CPU!). Since Darknet is written in C, I’m thinking to write a NIFs. To render the frames and detected objects I could use Phoenix or Scenic ! More on this in further articles!
As I briefly said at the beginning, I’ve also explored other ways to talk with Python, like Pyrlang for example, which deserves an article on its own.
A very special thanks to Evadne Wu , who gave me great advice and feedbacks!
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。