How to: run BpfTrace from a small alpine image, with least privileges.

Paulo Gomes
ITNEXT
Published in
6 min readOct 26, 2019

--

Photo by Sai Kiran Anagani on Unsplash

BpfTrace is a high-level tracing language for Linux eBPF, which allows you to pull information that tends to be quite handy in performance and security investigations. For more information on bpftrace, check the official reference guide.

TL;DR:

Here's how to run bpftrace straight out of an alpine container image.

docker run --rm -t -v /sys/kernel/debug:/sys/kernel/debug:ro --cap-add=SYS_ADMIN --security-opt no-new-privileges paulinhu/bpftrace:alpine sh -c "/init.sh && bpftrace -e 'kprobe:do_sys_open { printf(\"%s: %s\n\", comm, str(arg1)) } interval:s:2 { exit(); }'"

If the above does not work for you, either your kernel has lock-down enabled or your kernel major version is not compatible with the version I built the docker image (5.x). Troubleshooting are provided at the bottom of this page.

Running that container, should results in a list of all the current executing commands in the host machine and the files that each one have opened:

Note that the only mounted folder to the container is in read-only mode. Also, in the example above I am not running with a seccomp profile. Further down this post I show a custom seccomp profile and how to use it — assuming you are that way inclined. :)

The long story…

I could not find any alpine image with bpftrace that I could simply download and use. I found that a bit strange, but then trying to build my own I realised why. Here are the basic challenges on this endeavour:

1. No stable bpftrace package for alpine

Currently, the only bpftrace package available for alpine is in the testing channel. I prefer to not rely on testing packages as they may change and would break your docker build command.

2. Dependencies not easily available for alpine

If the focus was to use ubuntu, you could simply generate an image with bpftrace with a single command:

FROM ubuntu:disco as buildRUN apt update && apt install -y bpftrace linux-headers-$(uname -r)

The story is not as simple for "alpiners".

3. Bpftrace has dependency on the host machine kernel

At runtime, bpftrace will refer to kernel headers using the kernel release name of the target host machine . That means, if you build an image in your machine that is 5.0.0–32-generic, and try to run that image on a server that is 5.0.0–1020-gcp, it will fail to run.

Building a light bpftrace image

Most of the problems above can be resolved by building all dependencies from source. Using a multistage-build, all the source code can be downloaded, together with their dependencies and their building dependencies. Then at the end just the required static files will be copied into an alpine:latest image. The resulting Dockerfile looks like this:

Note that to resolve the run-time dependency to the kernel release name, I create an init.sh script that symlinks the host machine kernel module name to the one created at image build time. Unless they are vastly different, bpftrace should not complain. Obviously, types change through new kernel releases, so you should strive to build for the target machine, however, with this approach you should get away with quite some drift.

Using the docker image above, assuming you are targeting similar to your local machine, you can build the image using:

build -t my-bpftrace:alpine \
--build-arg KERNEL_VERSION=$(uname -r | awk -F- '{ print $1 }') \
--build-arg KERNEL_RELEASE=$(uname -r) \
.

Once built, you can execute on the same machine by using the below:

docker run --rm -t -v /sys/kernel/debug:/sys/kernel/debug:ro --cap-add=SYS_ADMIN --security-opt no-new-privileges my-bpftrace:alpine bpftrace -e 'kprobe:do_sys_open { printf("%s: %s\n", comm, str(arg1)) } interval:s:2 { exit(); }'

If you need to run on a different machine that uses a different kernel release name, ensure you execute /init.sh first. On that case, also make sure to escape any double quotes:

docker run --rm -t -v /sys/kernel/debug:/sys/kernel/debug:ro --cap-add=SYS_ADMIN --security-opt no-new-privileges my-bpftrace:alpine sh -c "/init.sh && bpftrace -e 'kprobe:do_sys_open { printf(\"%s: %s\n\", comm, str(arg1)) } interval:s:2 { exit(); }'"

The result is a 80mb container image:

Which becomes 27mb once compressed:

That's it! At this point you have a light and simple image to run your bpftrace commands. The bits below are optional, assuming that you don't care about seccomp and you did not have any issues along the way. :)

For the full code to build the image, check out this repo.

Restrict the container image with seccomp

BpfTrace must be executed with the CAP_SYS_ADMIN capability and also have (read-only) access to the /sys/kernel/debug folder. Using a custom seccomp profile helps to decrease the attack surface, which is not small when running with such capability.

Here's the profile I have been using:

The seccomp profile can be referenced by adding --security-opt seccomp=profile-name.json. The final command should look like:

docker run --rm -t -v /sys/kernel/debug:/sys/kernel/debug:ro --cap-add=SYS_ADMIN --security-opt seccomp=bpftrace.json --security-opt no-new-privileges paulinhu/bpftrace:alpine sh -c "/init.sh && bpftrace -e 'kprobe:do_sys_open { printf(\"%s: %s\n\", comm, str(arg1)) } interval:s:2 { exit(); }'"

By applying the seccomp profile the container will be restrained and won't be able to use system calls beyond the ones I mapped that were required to run bpftrace commands.

Playing around with bpftrace

There are loads of on-liners that can give you a deeper insight of what's going on in your servers, which goes beyond the scope of this. But here's an extract from the examples in the bpftrace's repo:

# Syscall count by program
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# Read bytes by process:
bpftrace -e 'tracepoint:syscalls:sys_exit_read /args->ret/ { @[comm] = sum(args->ret); }'

# Show per-second syscall rates:
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ = count(); } interval:s:1 { print(@); clear(@); }'

# Trace disk size by process
bpftrace -e 'tracepoint:block:block_rq_issue { printf("%d %s %d\n", pid, comm, args->bytes); }'

# Count page faults by process
bpftrace -e 'software:faults:1 { @[comm] = count(); }'

# Files opened by process
bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'

Whilst running them, you can always define a timeout by adding at the end interval:s:1 { exit(); }. Also make sure you go through the official one-liner tutorial.

Troubleshooting

Kernel lock-down enabled

This will be the default in some desktop distros, such as Ubuntu. Before trying to disable it, ensure you are running the container with CAP_SYS_ADMIN.

Symptom
When executing the container you get the error below:

No error information: couldn't set RLIMIT_MEMLOCK for bpftrace. If your program is not loading, you can try "ulimit -l 8192" to fix the problemError creating printf map: Operation not permitted
Creation of the required BPF maps has failed.
Make sure you have all the required permissions and are not confined (e.g. like snapcraft does). `dmesg` will likely have useful output for further troubleshooting

Workaround
Disable kernel lock-down:

sudo bash -c 'echo 1 > /proc/sys/kernel/sysrq'
sudo bash -c 'echo x > /proc/sysrq-trigger'
# https://bugzilla.redhat.com/show_bug.cgi?id=1599197

I can't execute commands inside the container

Running the image with the seccomp profile above will restrict what can be executed within that container.

Symptom
Running an ls commands results in several errors:

Workaround
There are two options here: 1) customise the seccomp profile to your actual needs. 2) do not use the seccomp profile.

Fatal error: `linux/types.h` file not found

BpfTrace could not find the kernel headers.

Symptom

No error information: couldn’t set RLIMIT_MEMLOCK for bpftrace. If your program is not loading, you can try “ulimit -l 8192” to fix the problem/bpftrace/include/asm_goto_workaround.h:14:10: fatal error: ‘linux/types.h’ file not found

Workaround
Most probably the /init.sh command was not executed.

/init.sh

If after the above the problem persist, check whether the kernel in which the image was built is compatible with the kernel in which it is being executed.

For general errors, check BCC FAQ.

--

--

Software craftsman on the eternal learning path towards (hopefully) mastery. Security enthusiast keen on SecDevOps. My opinions are my own.