Tag Archives: Performance

Performance Analysis on Embedded Linux with perf and hotspot

This is about profiling your applications on your embedded Linux target or let’s say finding the spots of high CPU usage, which is a common concern in practice. For an extensive overview see Linux Performance by Brendan Gregg. We will focus on viewing flame graphs with a tool called hotspot here, based on performance data recorded with perf. This proved to be helpful enough to solve most of the performance issues I had lately.

Installing the Tools

We have two sides here: the embedded Linux target and your Linux workstation host. For your computer you need to install hotspot. In Debian it is available from version 10 (buster). You can build from source of course, I did that with Debian 9 (stretch) a while ago. IIRC there are instructions for that upstream. Or you build it from the deb-src package from Debian unstable (sid) by following this BuildingTutorial.1

The embedded target part needs basically two parts. You have to set some options in the kernel config and you need the userland tool perf. For ptxdist here’s what I did:

  • Add -fno-omit-frame-pointer to global CFLAGS and CXXFLAGS
  • Enable PTXCONF_KERNEL_TOOL_PERF

Note: I had to update my kernel from v4.9 to v4.14, otherwise I got build errors when building perf.

Configuring the Kernel

I won’t quote the whole kernel config here, but I have a diff on what I had to set to make perf record useful things. These options are probably important, at least I had those on in my debug sessions (others might also be needed):

  • CONFIG_KALLSYMS
  • CONFIG_PERF_EVENTS
  • CONFIG_UPROBES
  • CONFIG_STACKTRACE
  • CONFIG_FTRACE

Using it

For embedded use, I basically followed the instructions of upstream hotspot. You might however want to dive a little into the options of perf, because it is a very powerful tool. What I did to record on the target was basically this to get samples from my daemon application mydaemon for 30 seconds:

This can produce quite a lot of data, so use it with short times first to not fill your filesystem. Luckily I had enough space on the flash memory of the embedded target available. Then just follow what the hotspot README says: copy the file and your kernel symbols to your host and call hotspot with the right options to your sysroot. This was the call I used (from a subfolder of my ptxdist BSP, where I copied those files to):

Happy performance analysing!

  1. I did not test that with hotspot []