Better living through software

Ben Hutchings's diary of life and technology

Email: ben@decadent.org.uk • Twitter: @benhutchingsuk • Debian: benh • Gitweb: git.decadent.org.uk • Github: github.com/bwhacks

Sat, 21 Sep 2019

Linux Plumbers Conference 2019, part 2

Here's the second chunk of notes I took at Linux Plumbers Conference earlier this month. Part 1 covered the Distribution kernels track.

Kernel Debugging Tools BoF

Moderators: George Wilson and Serapheim Dimitropoulos from Delphix; Omar Sandoval from Facebook

Details: https://linuxplumbersconf.org/event/4/contributions/539/

Problem: ability to easily anlyse failures in production (live system) or post-mortem (crash dump).

Debuggers need to:

Most people present use crash; one mentioned crash-python (aka pycrash) and one uses kgdb.

Pain points:

crash-python is a Python layer on top of a gdb fork. Uses libkdumpfile to decode compressed crash-dumps.

drgn (aka Dragon) is a debugger-as-a-library. Excels in introspectiion of live systems and crash-dumps, and covers both kernel and user-space. It can be extended through Python. As a library it can be imported and used from the Python REPL.

sdb is Deplhix's front-end to drgn, providing a more shell-like interactive interface. Example of syntax:

> modules | filter obj.refcnt.counter > 10 | member name

Currently it doesn't always have good type information for memory. A raw virtual address can be typed using the "cast" command in a pipeline. Hoping that BTF will allow doing better.

Allows defining pretty-print functions, though it appears these have to be explciitly invoked.

Answering tough questions:

Some discussion around the fact that drgn has a lot of code that's dependent on kernel version, as internal structures change. How can it be kept in sync with the kernel? Could some of that code be moved into the kernel tree?

Omar (I think) said that his approach was to make drgn support multiple versions of structure definitions.

Q: How does this scale to the many different kernel branches that are used in different distributions and different hardware platforms?

A: drgn will pick up BTF structure definitions. When BTF is available the code only needs to handle addition/removal of members it accesses.

Brendan Gregg made a plea to distro maintainers to enable BTF. (CONFIG_DEBUG_INFO_BTF).

Wayland BoF

Moderator: Hans de Goede of Red Hat

Details: https://linuxplumbersconf.org/event/4/contributions/533/

Pain points and missing pieces with Wayland, or specifically GNOME Shell:

IoT from the point of view of view of a generic and enterprise distribution

Speaker: Peter Robinson of Red Hat

Details: https://linuxplumbersconf.org/event/4/contributions/439/

The good

Can now use u-boot with UEFI support on most Arm hardware. Much easier to use a common kernel on multiple hardware platforms, and UEFI boot can be assumed.

The bad

"Enterprise" and "industrial" IoT is not a Raspberry Pi. Problems result from a lot of user-space assuming the world is an RPi.

Is bluez still maintained? No user-space releases for 15 months! Upstream not convinced this is a problem, but distributions now out of synch as they have to choose between last release and arbitrary git snapshot.

Wi-fi and Bluetooth firmware fixes (including security fixes) missing from linux-firmware.git. RPi Foundation has improved Bluetooth firmware for the chip they use but no-one else can redistribute it.

Lots of user-space uses /sys/class/gpio, which is now deprecated and can be disabled in kconfig. libgpiod would abstract this, but has poor documentation. Most other GPIO libraries don't work with new GPIO UAPI.

Similar issues with IIO - a lot of user-space doesn't use it but uses user-space drivers banging GPIOs etc. libiio exists but again has poor documentation.

For some drivers, even newly added drivers, the firmware has not been added to linux-firmware.git. Isn't there a policy that it should be? It seems to be an unwritten rule at present.

Toolchain track

Etherpad: https://etherpad.net/p/LPC2019_TC/timeslider#5767

Security feature parity between GCC and Clang

Speaker: Kees Cook of Google

Details: https://linuxplumbersconf.org/event/4/contributions/398/

LWN article: https://lwn.net/Articles/798913/

Analyzing changes to the binary interface exposed by the Kernel to its modules

Speaker: Dodji Seketeli of Red Hat

Details: https://linuxplumbersconf.org/event/4/contributions/399/

Wrapping system calls in glibc

Speakers: Maciej Rozycki of WDC

Details: https://linuxplumbersconf.org/event/4/contributions/397/

LWN article: https://lwn.net/Articles/799331/

posted at: 00:09 | path: / | permanent link to this entry

Thu, 12 Sep 2019

Debian LTS work, August 2019

I was assigned 20 hours of work by Freexian's Debian LTS initiative and worked all those hours this month.

I prepared and, after review, released Linux 3.16.72, including various security and other fixes. I then rebased the Debian package onto that. I uploaded that with a small number of other fixes and issued DLA-1884-1. I also prepared and released Linux 3.16.73 with another small set of fixes.

I backported the latest security update for Linux 4.9 from stretch to jessie and issued DLA-1885-1 for that.

posted at: 13:45 | path: / | permanent link to this entry

Mon, 09 Sep 2019

Distribution kernels at Linux Plumbers Conference 2019

I'm attending the Linux Plumbers Conference in Lisbon from Monday to Wednesday this week. This morning I followed the "Distribution kernels" track, organised by Laura Abbott.

I took notes, included below, mostly with a view to what could be relevant to Debian. Other people took notes in Etherpad. There should also be video recordings available at some point.

Upstream 1st: Tools and workflows for multi kernel version juggling of short term fixes, long term support, board enablement and features with the upstream kernel

Speaker: Bruce Ashfield, working on Yocto at Xilinx.

Details: https://linuxplumbersconf.org/event/4/contributions/467/

Yocto's kernel build recipes need to support multiple active kernel versions (3+ supported streams), multiple architectures, and many different boards. Many patches are required for hardware and other feature support including -rt and aufs.

Goals for maintenance:

Other distributions have similar goals but very few tools in common. So there is a lot of duplicated effort.

Supporting developers, distro builds and end users is challenging. E.g. developers complained about Yocto having separate git repos for different kernel versions, as this led to them needing more disk space.

Yocto solution:

Using Yocto to build a distro and maintain a kernel tree

Speaker: Senthil Rajaram & Anatol Belski from Microsoft

Details: https://linuxplumbersconf.org/event/4/contributions/469/

Microsoft chose Yocto as build tool for maintaining Linux distros for different internal customers. Wanted to use a single kernel branch for different products but it was difficult to support all hardware this way.

Maintaining config fragments and sensible inheritance tree is difficult (?). It might be helpful to put config fragments upstream.

Laura Abbott said that the upstream kconfig system had some support for fragments now, and asked what sort of config fragments would be useful. There seemed to be consensus on adding fragments for specific applications and use cases like "what Docker needs".

Kernel build should be decoupled from image build, to reduce unnecessary rebuilding.

Initramfs is unpacked from cpio, which doesn't support SELinux. So they build an initramfs into the kernel, and add a separate initramfs containing a squashfs image which the initramfs code will switch to.

Making it easier for distros to package kernel source

Speaker: Don Zickus, working on RHEL at Red Hat.

Details: https://linuxplumbersconf.org/event/4/contributions/466/

Fedora/RHEL approach:

Lots of discussion about whether config can be shared upstream, but no agreement on that.

Kyle McMartin(?): Everyone does the hierarchical config layout - like generic, x86, x86-64 - can we at least put this upstream?

Monitoring and Stabilizing the In-Kernel ABI

Speaker: Matthias Männich, working on Android kernel at Google.

Details: https://linuxplumbersconf.org/event/4/contributions/468/

Why does Android need it?

Project Treble made most of Android user-space independent of device. Now they want to make the kernel and in-tree modules independent too. For each kernel version and architecture there should be a single ABI. Currently they accept one ABI bump per year. Requires single kernel configuration and toolchain. (Vendors would still be allowed to change configuration so long as it didn't change ABI - presumably to enable additional drivers.)

ABI stability is scoped - i.e. they include/exclude which symbols need to be stable.

ABI is compared using libabigail, not genksyms. (Looks like they were using it for libraries already, so now using it for kernel too.)

Q: How we can ignore compatible struct extensions with libabigail?

A: (from Dodji Seketeli, main author) You can add specific "suppressions" for such additions.

KernelCI applied to distributions

Speaker: Guillaume Tucker from Collabora.

Details: https://linuxplumbersconf.org/event/4/contributions/470/

Can KernelCI be used to build distro kernels?

KernelCI currently builds arbitrary branch with in-tree defconfig or small config fragment.

Improvements needed:

Some in audience questioned whether building a package was necessary.

Possible further improvements:

Should KernelCI be used to build distro kernels?

Seems like a pretty close match. Adding support for different use-cases is healthy for KernelCI project. It will help distro kernels stay close to upstream, and distro vendors will then want to contribute to KernelCI.

Discussion

Someone pointed out that this is not only useful for distributions. Distro kernels are sometimes used in embedded systems, and the system builders also want to check for regressions on their specific hardware.

Q: (from Takashi Iwai) How long does testing typically takes? SUSE's full automated tests take ~1 week.

A: A few hours to build, depending on system load, and up to 12 hours to complete boot tests.

Automatically testing distribution kernel packages

Speaker: Alice Ferrazzi of Gentoo.

Details: https://linuxplumbersconf.org/event/4/contributions/471/

Gentoo wants to provide safe, tested kernel packages. Currently testing gentoo-sources and derived packages. gentoo-sources combines upstream kernel source and "genpatches", which contains patches for bug fixes and target-specific features.

Testing multiple kernel configurations - allyesconfig, defconfig, other reasonable configurations. Building with different toolchains.

Tests are implemented using buildbot. Kernel is installed on top of a Gentoo image and then booted in QEMU.

Generalising for discussion:

Don Zickus talked briefly about Red Hat's experience. They eventually settled on Gitlab CI for RHEL.

Some discussion of what test suites to run, and whether they are reliable. Varying opinions on LTP.

There is some useful scripting for different test suites at https://github.com/linaro/test-definitions.

Tim Bird talked about his experience testing with Fuego. A lot of the test definitions there aren't reusable. kselftest currently is hard to integrate because tests are supposed to follow TAP13 protocol for reporting but not all of them do!

Distros and Syzkaller - Why bother?

Speaker: George Kennedy, working on virtualisation at Oracle.

Details: https://linuxplumbersconf.org/event/4/contributions/473/

Which distros are using syzkaller? Apparently Google uses it for Android, ChromeOS, and internal kernels.

Oracle is using syzkaller as part of CI for Oracle Linux. "syz-manager" schedules jobs on dedicated servers. There is a cron job that automatically creates bug reports based on crashes triggered by syzkaller.

Google's syzbot currently runs syzkaller on GCE. Planning to also run on QEMU with a wider range of emulated devices.

How to make syzkaller part of distro release process? Need to rebuild the distro kernel with config changes to make syzkaller work better (KASAN, KCOV, etc.) and then install kernel in test VM image.

How to correlate crashes detected on distro kernel with those known and fixed upstream?

Example of benefit: syzkaller found regression in rds_sendmsg, fixed upstream and backported into the distro, but then regressed in Oracle Linux. It turned out that patches to upgrade rds had undone the fix.

syzkaller can generate test cases that fail to build on old kernel versions due to symbols missing from UAPI headers. How to avoid this?

Q: How often does this catch bugs in the distro kernel?

A: It doesn't often catch new bugs but does catch missing fixes and regressions.

Q: Is anyone checking the syzkaller test cases against backported fixes?

A: Yes [but it wasn't clear who or when]

Google has public database of reproducers for all the crashes found by syzbot.

Wish list:

Other possible types of fuzzing (mostly concentrated on KVM):

posted at: 14:32 | path: / | permanent link to this entry

Sat, 27 Jul 2019

Debian LTS work, July 2019

I was assigned 18.5 hours of work by Freexian's Debian LTS initiative and worked all those hours this month.

I prepared and released Linux 3.16.70 with various fixes from upstream. I then rebased jessie's linux package on this. Later in the month, I picked the fix for CVE-2019-13272, uploaded the package, and issued DLA-1862-1. I also released Linux 3.16.71 with just that fix.

I backported the latest security update for Linux 4.9 from stretch to jessie and issued DLA-1863-1.

posted at: 14:40 | path: / | permanent link to this entry

Talk: What's new in the Linux kernel (and what's missing in Debian)

As planned, I presented my annual talk about Linux kernel changes at DebConf on Monday—remotely. (I think this was a DebConf first.)

A video recording is already available (high quality, low quality). The slides are linked from my talks page and from the DebConf event page.

Thanks again to the video team for taking the time to work out video and audio routing with me.

posted at: 14:24 | path: / | permanent link to this entry