Better living through software

Ben Hutchings's diary of life and technology

Email: • Twitter: @benhutchingsuk • Debian: benh • Gitweb: • Github:

Mon, 30 Sep 2019

Kernel Recipes 2019, part 1

This conference only has a single track, so I attended almost all the talks. This time I didn't take notes but I've summarised all the talks I attended.

Updated: Noted slides are available for all talks. Added links to the video streams.

ftrace: Where modifying a running kernel all started

Speaker: Steven Rostedt

Details and slides:

Video: Youtube

This talk explains how the kernel's function tracing mechanism (ftrace) works, and describes some of its development history.

It was quite interesting, but you probably don't need to know this stuff unless you're touching the ftrace implementation.

Analyzing changes to the binary interface exposed by the Kernel to its modules

Speakers: Dodji Seketeli, Jessica Yu, Matthias Männich

Details and slides:

Video: Youtube

The upstream kernel does not have a stable ABI (or API) for use by modules, but OS distributors often want to support the use of out-of-tree modules by ensuring that at least some subset of the kernel ABI remains stable within a given OS release.

Currently the kernel build process generates a "version" or "CRC" for each exported symbol by parsing the relevant type definitions. There is a load-time ABI check based on comparing these, and distributors can compare them at build time to detect ABI breaks. However this doesn't work that well and it's hard to work out what caused a change.

The speaker develops the "libabigail" library and tools. These can extract ABI definitions from standard debug information (DWARF), and then analyse and compare ABIs for different versions of a shared libraries, or of the Linux kernel and modules. They are likely to replace the kernel's current symbol versioning approach at some point. He talked about the capabilities of libabigail, plans for improving it, and some limitations of C ABI checkers.

BPF at Facebook

Speaker: Alexei Starovoitov

Details and slides:

Video: Youtube

The Berkeley Packet Filter (BPF) is a simple virtual machine implemented by several kernels. It allows user-space to add code that runs in kernel context, without compromising the integrity of the kernel.

In recent years Linux has extended this virtual machine architecture to create eBPF, which is expressive enough to be targeted by general-purpose compilers such as Clang and (in the near future) gcc. eBPF can be used for filtering network packets (the original purpose of BPF), tracing events, and many other purposes.

The speaker talked about practical experiences using eBPF with tracing at Facebook. These mainly involved investigating performance problems. He also talked about the difficulties of doing this on production servers without developer tools installed, and how this is being addressed.

Kernel hacking behind closed doors

Speaker: Thomas Gleixner

Details and slides:

Video: Youtube

The speaker talked about how kernel developers and hardware vendors have been handling speculative execution vulnerabilities, and the friction between how the vendors' preferred process and the usual kernel development processes.

He described the mailing list manager he wrote to support discussion of security issues with a long embargo period, which sends and receives encrypted messages in both S/MIME and PGP/MIME formats (depending on the subscriber).

Finally he talked about the process that has been settled on for handling future issues of this time with minimal legal paperwork.

This was somewhat marred by a lawyer joke and a generally combative attitude to hardware vendors.

What To Do When Your Device Depends on Another One

Speaker: Rafael Wysocki

Details and slides:

Video: Youtube

The Linux device model represents all devices as a simple hierarchy. Driver binding and unbinding (probe/remove), and power management operations, are sequenced based on the assumption that a device only depends on its parent in the device model.

On PCs, additional dependencies are often hidden behind abstractions such as ACPI, so that Linux does not need to be aware of them. On most embedded systems, however, such abstractions are usually missing and Linux does need to be aware of additional dependencies.

(A few years ago, the device driver core gained support for an error code from probe (-EPROBE_DEFER) that indicates that some dependency is not yet bound, and causes the device to be re-probed later. But this is an incomplete, stop-gap solution.)

The speaker described the new "device links" API which provides a way to record additional dependencies in the device model. The device driver core will use this information to sequence operations on multiple devices correctly.

Metrics are money

Speaker: Aurélien Rougemont

Details and slides:

Video: Youtube

The speaker talked about several instances from his experience where system metrics were used to justify buying or rejecting new hardware. In some cases, these metrics were not accurate or consistent, which could lead to bad decisions. He made a plea for better documentation of metrics reported by the Linux kernel.

No NMI? No Problem! – Implementing Arm64 Pseudo-NMI

Speaker: Julien Thierry

Details and slides:

Video: Youtube

Linux typically uses Non-Maskable Interrupts (NMIs) for Performance Monitoring Unit (PMU) interrupts. NMIs are (almost) never disabled, so this allows interrupt handlers and other code that runs with interrupts disabled to be profiled accurately. On architectures that do not have NMIs, typically Linux can use the highest interrupt priority for this instead, and only mask the lower priorities.

On the Arm architecture, there is no NMI but there are two architectural interrupt priority levels (IRQ and FIQ). However on 64-bit Arm systems FIQ is typically reserved to system firmware so Linux only uses IRQ. This results in inaccurate profiling.

The speaker described the implementation of a pseudo-NMI for 64-bit Arm. This is done by leaving IRQs enabled on the CPU and masking them selectively on the Arm generic interrupt controller (GIC), which supports many more priority levels. However this effectively requires GIC v3 or v4 because these operations are prohibitively slow on earlier versions.

Marvels of Memory Auto-configuration (SPD)

Speaker: Jean Delvare

Details and slides:

Video: Youtube

The speaker talked about the history of standardised DRAM modules (SIMMs and DIMMs) and how system firmware can detect them and find out their size and timing requirements.

DIMMs expose this information through Serial Presence Detect (SPD) which until recently used standard 256-byte I²C EEPROMs.

For the latest generation of DIMMs (DDR4), the configuration information can be larger than 256 bytes and a new interface was required. Jean described and criticised this interfaces.

He also talked about the Linux drivers and utilities that can be used to read the SPD EEPROMs.

posted at: 18:28 | path: / | permanent link to this entry