Benchmarking Debian vs. Alpine as a Base Docker Image

栏目: IT技术 · 发布时间: 5年前

内容简介：Updated on July 10th, 2018 in#dockerQuick Jump:Why Am I Comparing These 2 Base Images Now?Ever since Docker announced they were gravitating towards using Alpine in their official base Docker images, I hopped on board and embraced Alpine.

Updated on July 10th, 2018 in#docker

Benchmarking Debian vs Alpine as a Base Docker Image

Most official Docker images offer both Debian and Alpine based images but there's some surprising performance results between the 2.

Quick Jump:Why Am I Comparing These 2 Base Images Now? | Testing Both Base Images

Ever since Docker announced they were gravitating towards using Alpine in their official base Docker images, I hopped on board and embraced Alpine.

I mean, what’s not to love. It’s a super minimal distribution of Linux with an extremely small attack surface. It seems like a perfect match to run it as a base image inside of a container.

I wrote about how awesome Alpine was over a year ago and have personally been using it all this time without any show stopping issues.

Why Am I Comparing These 2 Base Images Now?

I definitely didn’t wake up today thinking “gee willikers, I wonder how Debian stacks up to Alpine as a base Docker image in 2018” .

What really happened was, I put together an article which went over a few Docker best practices about a month ago and it made its way onto Reddit.

To my surprise it received over 150 upvotes, a 99% upvote rating and a ton of engagement from the Reddit community .

To be honest, that’s why I wrote the post. I’ve been using Docker for a really long time and picked up what I think are some neat patterns over the years, but I really wanted to see what others have been doing so I can improve. I’m constantly learning from others.

Discovering 2 Negative Trends about Alpine

If you go through the Reddit comments you’ll find a number of people commenting about how DNS was randomly failing in Alpine and others commented about poor runtime performance for certain types of activities (such as a web app connecting to a database).

DNS lookup issues:

I’m not one to blindly follow 1 person’s comments, but once a few people start to mention the same thing it starts to get interesting. Still, there’s no way I would trust a few comments without some type of scientific proof.

Fortunately someone linked to a GitHub issue related to Alpine which contains more information. Perhaps this bug will be fixed in due time, but it does look like there’s something weird going on with DNS lookups inside of Alpine.

It may also be related to a 9 year old bug in BusyBox (Alpine uses this OS under the hood).

I haven’t experienced this personally, but I’m also not running anything at rofl scale which uses massive Kubernetes clusters, so maybe I’m just not affected by this problem.

Fortunately for us, Alpine has documentation on why DNS lookups might fail and it also explains why I never experienced this issue. Good to know.

Runtime performance woes with common web app cases:

Another Reddit user mentioned their Node app ran 15% slower when using Alpine as a base image compared to Debian. He also mentioned his Python apps were slower too.

This Reddit commenter even said they had a 35% difference in speed for real world test suites where they run 500-700 unit tests a day. That really caught my eye.

Of course, I replied to him asking for more specific information because a line like “15% slower” doesn’t say much without knowing the context of the situation.

We had a couple of back and forths and he supplied some hard numbers where he had a Python application running in a container perform 10,000 database selects in PostgreSQL which was running in a different container.

# postgres:9.6.3 and python 2.7

Total test time 15.3489780426 seconds
Total test time 13.5786788464 seconds
Total test time 14.2057600021 seconds


# postgres:9.6.3-alpine and python 2.7

Total test time 14.262032032 seconds
Total test time 13.7757499218 seconds
Total test time 14.1344659328 seconds


# postgres 9.6.3 and python:2.7-alpine

Total test time 18.1418809891 seconds
Total test time 16.0904250145 seconds
Total test time 17.1380209923 seconds

What this says is the Postgres image runs just as fast in both Alpine and Debian but when an Alpine based Python image tries to connect to it, there is a very noticeable slow down.

That immediately made me think that maybe certain system level packages are the culprit here. For example most popular PostgreSQL connection libraries in Python and Ruby (and other languages) require installing libpq-dev on Debian and postgresql-dev in Alpine.

This is a package that needs to be installed in your image to use packages like psycopg2 in Python and pg in Ruby. They are what let you connect to PostgreSQL from your web app.

And here we are. Now we have something to test and a real reason to compare both of these base images.

Testing Both Base Images

Before we get into the benchmarks, it’s worth pointing out that I’m not a fan of contrived microbenchmarks. Sure, they are good for testing regressions at the language / framework level, but are pretty useless for testing real world scenarios.

I always roll my eyes when people are like, “well the flim flam web framework can do 163,816 requests per second but bippity bop only does 97,471 reqs / second, it’s trash!” .

Then you look into what the benchmark does and all it does is return an empty 200 response. As soon as you do something like serialize JSON or return HTML from a template suddenly, both frameworks become way slower.

And then you factor in something like a single database call and hey what do you know, both frameworks drop an order of magnitude in speed.

Benchmarking a Real World Web Application

I’m going to take the most up to date and complete version of my Flask course’s code base, which is running Flask 1.0.2 and all of the latest library versions at the time of writing this article.

It’s a decently sized Flask app with 4,000+ lines of code, a couple of blueprints, backed by a SQLAlchemy driven PostreSQL database and also has a bunch of other moving parts.

The test environment:

The test case will be to load up a page that uses a server rendered template to return HTML and also includes performing 1 SELECT statement to lookup a user by ID in the database.

The Flask app was configured with:

gunicorn with Flask in production mode (not debug mode)
gunicorn running with 8 threads (gthread) and 8 workers
Logging level INFO

The test system (my dev box) was configured with:

i5 3.2GHz CPU with 16GB of RAM and an SSD
Docker Compose to launch the project with no app bind volumes
Docker for Windows to run the Docker daemon (rebooted between each test)
WSL to run the Docker CLI

The general testing strategy:

I’m going to run an HTTP benchmark tool called wrk to make requests to the Flask app. I will run it 5 times for each base image and take the fastest result.

I will perform this test using both Debian and Alpine as a base image with no other changes to the code base, app configuration, how the app is ran or the test machine.

I will be running docker-compose up -d to launch the Compose project in the background and run wrk -t8 -c50 -d30 <url to test> to launch wrk with 8 threads while keeping open 50 HTTP connections to the page. The test will run for 30 seconds.

Spoiler alert: in both test cases my CPU maxed out at about 65%, so we’re not CPU bound.

Debian Set up

Here’s the Dockerfile I used:

FROM python:2.7-slim
LABEL maintainer="Nick Janetakis <nick.janetakis@gmail.com>"

RUN apt-get update && apt-get install -qq -y \
  build-essential libpq-dev --no-install-recommends

WORKDIR /app

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

COPY . .
RUN pip install --editable .

CMD gunicorn -c "python:config.gunicorn" "snakeeyes.app:create_app()"

Here’s the benchmark results:

nick@workstation:/e/tmp/bsawf$ docker-compose up -d
Recreating snakeeyestest_celery_1  ... done
Starting snakeeyestest_redis_1     ... done
Starting snakeeyestest_postgres_1  ... done
Recreating snakeeyestest_website_1 ... done

nick@workstation:/e/tmp/bsawf$ wrk -t8 -c50 -d30 localhost:8000/subscription/pricing
Running 30s test @ localhost:8000/subscription/pricing
  8 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   167.45ms  171.80ms   1.99s    87.87%
    Req/Sec    39.13     29.99   168.00     74.85%
  8468 requests in 30.05s, 56.17MB read
  Socket errors: connect 0, read 0, write 0, timeout 42
Requests/sec:    281.83
Transfer/sec:      1.87MB

Alpine Set up

Here’s the Dockerfile I used:

FROM python:2.7-alpine
LABEL maintainer="Nick Janetakis <nick.janetakis@gmail.com>"

RUN apk update && apk add build-base postgresql-dev

WORKDIR /app

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

COPY . .
RUN pip install --editable .

CMD gunicorn -c "python:config.gunicorn" "snakeeyes.app:create_app()"

Here’s the benchmark results:

nick@workstation:/e/tmp/bsawf$ docker-compose up -d
Recreating snakeeyestest_celery_1  ... done
Starting snakeeyestest_redis_1     ... done
Starting snakeeyestest_postgres_1  ... done
Recreating snakeeyestest_website_1 ... done

nick@workstation:/e/tmp/bsawf$ wrk -t8 -c50 -d30 localhost:8000/subscription/pricing
Running 30s test @ localhost:8000/subscription/pricing
  8 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   223.56ms  270.60ms   1.96s    88.41%
    Req/Sec    39.53     25.02   140.00     65.12%
  8667 requests in 30.05s, 57.49MB read
  Socket errors: connect 0, read 0, write 0, timeout 20
Requests/sec:    288.38
Transfer/sec:      1.91MB

Let’s Talk about the Results

That’s a bit anticlimactic. I expected Debian to blow the doors off Alpine and then go on a grand crusade to change everything I use to Debian.

If you account for system variance I would be ok with saying both distributions performed about the same, although the stdev with Alpine is a little high, but on the other hand, Debian had more timeouts. Both of which could still be variance with such a short test.

I don’t doubt those Reddit commenters in the results they had, and if anything this test shows that not all test cases are created equal.

One thing I will point out is the Reddit user’s test performed 10,000 SELECT statements where as my app performed 1 SELECT statement. That is a very big difference. I wonder what would happen on a more complex page with let’s say 15 database queries instead of 1.

If you would like to pick up the torch and create more test cases across different languages or Python versions, then please do so and let us know how it goes.

For now I’m going to continue using Alpine but if I ever run into a problem or see a compelling reason to switch then I will make the switch to Debian.