My docker image for IOT device is huge!

My docker image for IOT device is huge!

Optimizing docker images for Balena cloud.

Table of contents

Recently, I was working on IOT devices. My setup is a raspbery pi+oak device using balena cloud. As I was developing an applicaton for this setup, I had to write a dockerfile because Balena uses a dockerfile to build images and run containers on the connected pi devices. If you do not know about Balena, it is a cloud platform for deploying and managing fleets of IOT devices. You can learn more about Balena in their official website. The basic architecture of Balena is illustrated below. When a change is pushed to the devices, the changes are not built in balena cloud but they are built in the development environment and the built images are pushed to the devices through balena cloud. Finally, the images are downloaded on the IOT devices.

The problem arises because many IOT devices use cellular networks and may not always be connected to WiFi. Consequently, whenever you want to send an update to the device, entire new image has to be pushed. Services like Balena does provide a way to minimize the size of updates using delta updates. However, it is very useful to have smaller docker images in general.

Let’s jump into the Dockerfile.

FROM balenalib/%%BALENA_ARCH%%-debian-python:3.10-bullseye-build
ENV UDEV=1
ENV DEBIAN_FRONTEND=noninteractive
RUN install_packages git gcc python3-dev
RUN pip3 install --upgrade pip
RUN pip3 install .
RUN pip3 cache purge
WORKDIR /app/
COPY . .
CMD ["python", "main.py"]

Notice that base image is a build version. The build versions of images comes with build tools, compilers, libraries, and other dependencies required to install necessary packages. Consequently, these version of docker images are usually larger but it makes installing dependencies for your application easier. However, it can lead to image size becoming larger than anticipated.

This template also includes packages like git, gcc, and python3-dev. Depending on your application, these may not even be required for runtime. So why clutter your docker container with things that are not even required to run the python programs?

Now, let’s look at how multi-stage build can alleviate some of these problems.

# Build stage
FROM balenalib/%%BALENA_ARCH%%-debian-python:3.10-bullseye-build as b1
ENV DEBIAN_FRONTEND=noninteractive
RUN install_packages git gcc python3-dev
RUN pip3 install --upgrade pip
RUN pip3 install .

# Run stage
FROM balenalib/%%BALENA_ARCH%%-debian-python:3.10-bullseye-run
WORKDIR /app/
COPY --from=b1 /usr/local/lib/ /usr/local/lib/
COPY --from=b1 /lib/ /lib/
COPY . .

CMD ["python", "main.py"]

The first thing to do is to separate out the things required for building and running your applications. Notice in the above example that we use two different base images from balena repository. build for installing dependencies, and run for running.

Build stage

In this stage, we use the build version of the base image and install necessary packages, including python packages (for node projects, we would be installing node packages). Sometimes, python packages may require additional dependencies. We should also install them in this stage. For simplicity, I have named this build b1.

Run stage

In this stage, notice we have used run version of the base image. Then, the next thing is to do copy the required dependencies from b1 to the new build b2. It should be noticed that we are not copying the build tools here, and only the python packages required to run the application. It is also necessary to know where these packages are installed. In this case these two lines should do the job:

COPY --from=b1 /usr/local/lib/ /usr/local/lib/
COPY --from=b1 /lib/ /lib/

During my experiment, the size of docker image was reduced to around 600MB from 1.2.GB. That is almost half!

In this article, we delved into optimizing Docker images for IoT devices using Balena. Balena Cloud requires deploying updates as Docker images, which can become quite large, particularly for devices relying on cellular networks. To address this, I applied multi-stage builds in my Dockerfile. In the first build stage, I used a base image with development tools to handle the installation of dependencies and packages. In the second run stage, I switched to a minimal runtime image and transferred only the necessary files and libraries from the build stage. This approach reduced the Docker image size from 1.2GB to around 600MB, effectively cutting the size of updates and improving deployment efficiency.