[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <90b3b613-8665-425b-8132-5b9ac86ab616@oracle.com>
Date: Thu, 7 Nov 2024 15:05:56 +0000
From: Alan Maguire <alan.maguire@...cle.com>
To: Laura Nao <laura.nao@...labora.com>, regressions@...ts.linux.dev
Cc: linux-kernel@...r.kernel.org, kernel@...labora.com, bpf@...r.kernel.org,
chrome-platform@...ts.linux.dev
Subject: Re: [REGRESSION] module BTF validation failure (Error -22) on next
On 06/11/2024 16:08, Laura Nao wrote:
> Hello,
>
> KernelCI has detected a module loading regression affecting all AMD and
> Intel Chromebooks in the Collabora LAVA lab, occurring between
> next-20241024 and next-20241025.
>
> The logs indicate a failure in BTF module validation, preventing all
> modules from loading correctly (with CONFIG_MODULE_ALLOW_BTF_MISMATCH
> unset). The example below is from an AMD Chromebook (HP 14b na0052xx),
> with similar errors observed on other AMD and Intel devices:
>
> [ 5.284373] failed to validate module [cros_kbd_led_backlight] BTF: -22
> [ 5.291392] failed to validate module [i2c_hid] BTF: -22
> [ 5.293958] failed to validate module [chromeos_pstore] BTF: -22
> [ 5.302832] failed to validate module [coreboot_table] BTF: -22
> [ 5.309175] failed to validate module [raydium_i2c_ts] BTF: -22
> [ 5.309264] failed to validate module [i2c_cros_ec_tunnel] BTF: -22
> [ 5.322158] failed to validate module [typec] BTF: -22
> [ 5.327554] failed to validate module [snd_timer] BTF: -22
> [ 5.327573] failed to validate module [cros_usbpd_notify] BTF: -22
> [ 5.339272] failed to validate module [elan_i2c] BTF: -22
> [ 5.345821] failed to validate module [industrialio] BTF: -22
> [ 5.423113] failed to validate module [cfg80211] BTF: -22
> [ 5.443074] failed to validate module [cros_ec_dev] BTF: -22
> [ 5.448857] failed to validate module [snd_pci_acp3x] BTF: -22
> [ 5.454736] failed to validate module [cros_kbd_led_backlight] BTF: -22
> [ 5.461458] failed to validate module [regmap_i2c] BTF: -22
> [ 5.470228] failed to validate module [i2c_piix4] BTF: -22
> [ 5.491123] failed to validate module [i2c_hid] BTF: -22
> [ 5.491226] failed to validate module [chromeos_pstore] BTF: -22
> [ 5.496519] failed to validate module [coreboot_table] BTF: -22
> [ 5.502632] failed to validate module [snd_timer] BTF: -22
> [ 5.538916] failed to validate module [gsmi] BTF: -22
> [ 5.604971] failed to validate module [mii] BTF: -22
> [ 5.604971] failed to validate module [videobuf2_common] BTF: -22
> [ 5.604972] failed to validate module [sp5100_tco] BTF: -22
> [ 5.616068] failed to validate module [snd_soc_acpi] BTF: -22
> [ 5.680553] failed to validate module [bluetooth] BTF: -22
> [ 5.749320] failed to validate module [chromeos_pstore] BTF: -22
> [ 5.755440] failed to validate module [mii] BTF: -22
> [ 5.760522] failed to validate module [snd_timer] BTF: -22
> [ 5.783549] failed to validate module [bluetooth] BTF: -22
> [ 5.841561] failed to validate module [mii] BTF: -22
> [ 5.846699] failed to validate module [snd_timer] BTF: -22
> [ 5.892444] failed to validate module [mii] BTF: -22
> [ 5.897708] failed to validate module [snd_timer] BTF: -22
> [ 5.945507] failed to validate module [snd_timer] BTF: -22
>
> The full kernel log is available on [1]. The config used is available on
> [2] and the kernel/modules have been built using gcc-12.
>
> The issue is still present on next-20241105.
>
> I'm sending this report to track the regression while a fix is
> identified. The culprit commit hasn't been pinpointed yet, I'll report
> back once it's identified.
>
> Any feedback or suggestion for additional debugging steps would be greatly
> appreciated.
>
> Best,
>
Thanks for the report! Judging from the config, you're seeing this with
pahole v1.24. I have seen issues like this in the past where during a
kernel build, module BTF has been built against vmlinux BTF, and then
something later re-triggers vmlinux BTF generation. If that re-triggered
vmlinux BTF does not use the same type ids for types, this can result in
mismatch errors as above since modules are referring to out-of-date type
ids in vmlinux. That's just a preliminary guess though, we'll
need more info to help get to the bottom of this.
A few suggestions to help debug this:
- if you have build logs, check BTF generation of vmlinux. Did it in
fact happen twice perhaps? Even better if, if kernel CI saves logs, feel
free to send a pointer and I'll take a look.
- can you post the vmlinux (stripped of DWARF data if possible to limit
size) and one of the failing modules somewhere so we can analyze?
- Failing that,
bpftool btf dump file /path/2/vmlinux_from_build > vmlinux.raw
and upload of the vmlinux.raw and one of the failing module .kos would help.
I've tried to reproduce this; no luck so far at my end.
Alan
> Laura
>
> [1] https://pastebin.com/raw/dtvzBkxh
> [2] https://pastebin.com/raw/a1MGi3wH
>
> #regzbot introduced: next-20241024..next-20241025
>
>
Powered by blists - more mailing lists