[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c5d991d2-a7ef-4f14-b0c5-374371275057@linux.dev>
Date: Thu, 25 Dec 2025 21:04:59 -0800
From: Yonghong Song <yonghong.song@...ux.dev>
To: Ihor Solodrai <ihor.solodrai@...ux.dev>,
Luis Chamberlain <mcgrof@...nel.org>, Petr Pavlu <petr.pavlu@...e.com>,
Daniel Gomez <da.gomez@...nel.org>, Nathan Chancellor <nathan@...nel.org>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
<martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>
Cc: linux-kernel@...r.kernel.org, linux-modules@...r.kernel.org,
bpf@...r.kernel.org, linux-kbuild@...r.kernel.org, llvm@...ts.linux.dev
Subject: Re: [RFC PATCH v1] module: Fix kernel panic when a symbol st_shndx is
out of bounds
On 12/23/25 9:36 PM, Yonghong Song wrote:
>
>
> On 12/23/25 4:57 PM, Ihor Solodrai wrote:
>> I've been chasing down the following flaky splat, introduced by recent
>> changes in BTF generation [1]:
>>
>> ------------[ cut here ]------------
>> BUG: unable to handle page fault for address: ffa000000233d828
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 100000067 P4D 100253067 PUD 100258067 PMD 0
>> Oops: Oops: 0000 [#1] SMP NOPTI
>> CPU: 1 UID: 0 PID: 390 Comm: test_progs Tainted: G W
>> OE 6.19.0-rc1-gf785a31395d9 #331 PREEMPT(full)
>> Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 1.16.3-4.el9 04/01/2014
>> RIP: 0010:simplify_symbols+0x2b2/0x480
>> 9.737179] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83
>> fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20
>> c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
>> RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
>> RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
>> RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
>> RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
>> R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
>> R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
>> FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
>> PKRU: 55555554
>> Call Trace:
>> <TASK>
>> ? __kmalloc_node_track_caller_noprof+0x37f/0x740
>> ? __pfx_setup_modinfo_srcversion+0x10/0x10
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? kstrdup+0x4a/0x70
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? setup_modinfo_srcversion+0x1a/0x30
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? setup_modinfo+0x12b/0x1e0
>> load_module+0x133a/0x1610
>> __x64_sys_finit_module+0x31b/0x450
>> ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> do_syscall_64+0x80/0x2d0
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? exc_page_fault+0x95/0xc0
>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> RIP: 0033:0x7f1c63a2582d
>> 9.794028] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e
>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>> 08 0f 05 <48> 3d 01 f0 ff 8 8b 0d bb 15 0f 00 f7 d8 64 89 01 48
>> RSP: 002b:00007ffe513df128 EFLAGS: 00000206 ORIG_RAX:
>> 0000000000000139
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1c63a2582d
>> RDX: 0000000000000000 RSI: 0000000000ee83f9 RDI: 0000000000000016
>> RBP: 00007ffe513df150 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffe513e3588
>> R13: 000000000088fad0 R14: 00000000014bddb0 R15: 00007f1c63ba7000
>> </TASK>
>> Modules linked in: bpf_testmod(OE)
>> CR2: ffa000000233d828
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:simplify_symbols+0x2b2/0x480
>> 9.821595] Code: 85 f6 4d 89 f7 b8 01 00 00 00 4c 0f 44 f8 49 83
>> fd f0 4d 0f 44 fe 75 5b 4d 85 ff 0f 85 76 ff ff ff eb 50 49 8b 4e 20
>> c1 e0 06 <48> 8b 44 01 10 9 cf fd ff ff 49 89 c5 eb 36 49 c7 c5
>> RSP: 0018:ffa00000017afc40 EFLAGS: 00010216
>> RAX: 00000000003fffc0 RBX: 0000000000000002 RCX: ffa0000001f3d858
>> RDX: ffffffffc0218038 RSI: ffffffffc0218008 RDI: aaaaaaaaaaaaaaab
>> RBP: ffa00000017afd18 R08: 0000000000000072 R09: 0000000000000069
>> R10: ffffffff8160d6ca R11: 0000000000000000 R12: ffa0000001f3d577
>> R13: ffffffffc0214058 R14: ffa00000017afdc0 R15: ffa0000001f3e518
>> FS: 00007f1c638654c0(0000) GS:ff1100089b7bc000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffa000000233d828 CR3: 000000010ba1f001 CR4: 0000000000771ef0
>> PKRU: 55555554
>> Kernel panic - not syncing: Fatal exception
>> Kernel Offset: disabled
>>
>> This hasn't happened on BPF CI so far, for example, however I was able
>> to reproduce it on a particular x64 machine using a kernel built with
>> LLVM 20.
>>
>> The crash happens on attempt to load one of the BPF selftest modules
>> (tools/testing/selftests/bpf/test_kmods/bpf_test_modorder_x.ko) which
>> is used by kfunc_module_order test.
>>
>> The reason for the crash is that simplify_symbols() doesn't check for
>> bounds of the ELF section index:
>>
>> for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
>> const char *name = info->strtab + sym[i].st_name;
>>
>> switch (sym[i].st_shndx) {
>> case SHN_COMMON:
>>
>> [...]
>>
>> default:
>> /* Divert to percpu allocation if a percpu var. */
>> if (sym[i].st_shndx == info->index.pcpu)
>> secbase = (unsigned long)mod_percpu(mod);
>> else
>> /** HERE --> **/ secbase =
>> info->sechdrs[sym[i].st_shndx].sh_addr;
>> sym[i].st_value += secbase;
>> break;
>> }
>> }
>>
>> And in the case I was able to reproduce, the value 0xffff
>> (SHN_HIRESERVE aka SHN_XINDEX [2]) fell through here.
>>
>> Now this code fragment is between 15 and 20 years old, so obviously
>> it's not expected for a kmodule symbol to have such st_shndx
>> value. Even so, the kernel probably should fail loading the module
>> instead of crashing, which is what this patch attempts to fix.
>>
>> Investigating further, I discovered that the module binary became
>> corrupted by `${OBJCOPY} --update-section` operation updating .BTF_ids
>> section data in scripts/gen-btf.sh. This explains how the bug has
>> surfaced after gen-btf.sh was introduced:
>>
>> $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (2), but unable to locate the extended symbol index table
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (3), but unable to locate the extended symbol index table
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (4), but unable to locate the extended symbol index table
>> 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT RSV[0xffff]
>> __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
>> llvm-readelf: warning: 'bpf_test_modorder_x.ko': found an extended
>> symbol index (16), but unable to locate the extended symbol index table
>> 4: 0000000000000008 4 OBJECT LOCAL DEFAULT RSV[0xffff]
>> __BTF_ID__func__bpf_test_modorder_retx__44417
>>
>> vs expected
>>
>> $ llvm-readelf -s --wide bpf_test_modorder_x.ko | grep 'BTF_ID'
>> 3: 0000000000000000 16 NOTYPE LOCAL DEFAULT 6
>> __BTF_ID__set8__bpf_test_modorder_kfunc_x_ids
>> 4: 0000000000000008 4 OBJECT LOCAL DEFAULT 6
>> __BTF_ID__func__bpf_test_modorder_retx__44417
>>
>> But why? Updating section data without changing it's size is not
>> supposed to affect sections indices, right?
>>
>> With a bit more testing I confirmed that this is a LLVM-specific
>> issue (doesn't reproduce with GCC kbuild), and it's not stable,
>> because in link-vmlinux.h we also do:
>>
>> ${OBJCOPY} --update-section .BTF_ids=${btfids_vmlinux} ${VMLINUX}
>>
>> However:
>>
>> $ llvm-readelf -s --wide ~/workspace/prog-aux/linux/vmlinux | grep
>> 0xffff
>> # no output, which is good
>>
>> So the suspect is the implementation of llvm-objcopy. As it turns out
>> there is a relevant known bug that explains the flakiness and isn't
>> fixed yet [3].
>>
>> [1]
>> https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev/
>> [2] https://man7.org/linux/man-pages/man5/elf.5.html
>> [3]
>> https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952
>>
>> Signed-off-by: Ihor Solodrai <ihor.solodrai@...ux.dev>
>>
>> ---
>>
>> RFC
>>
>> While this llvm-objcopy bug is not fixed, we can not trust it in the
>> kernel build pipeline. In the short-term we have to come up with a
>> workaround for .BTF_ids section update and replace the calls to
>> ${OBJCOPY} --update-section with something else.
>>
>> One potential workaround is to force the use of the objcopy (from
>> binutils) instead of llvm-objcopy when updating .BTF_ids section.
>>
>> Alternatively, we could just dd the .BTF_ids data computed by
>> resolve_btfids at the right offset in the target ELF file.
>>
>> Surprisingly I couldn't find a good way to read a section offset and
>> size from the ELF with a specified format in a command line. Both
>> readelf and {llvm-}objdump give a human readable output, and it
>> appears we can't rely on the column order, for example.
>>
>> We could still try parsing readelf output with awk/grep, covering
>> output variants that appear in the kernel build.
>>
>> We can also do:
>>
>> llvm-readobj --elf-output-style=JSON --sections "$elf" | \
>> jq -r --arg name .BTF_ids '
>> .[0].Sections[] |
>> select(.Section.Name.Name == $name) |
>> "\(.Section.Offset) \(.Section.Size)"'
>>
>> ...but idk man, doesn't feel right.
>>
>> Most reliable way to determine the size and offset of .BTF_ids section
>> is probably reading them by a C program with libelf, such as
>> resolve_btfids. Which is quite ironic, given the recent
>> changes. Setting the irony aside, we could add smth like:
>> resolve_btfids --section-info=.BTF_ids $elf
>>
>> Reverting the gen-btf.sh patch is also a possible workaround, but I'd
>> really like to avoid it, given that BPF features/optimizations in
>> development depend on it.
>>
>> I'd appreciate comments and suggestions on this issue. Thank you!
>> ---
>> kernel/module/main.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index 710ee30b3bea..5bf456fad63e 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -1568,6 +1568,13 @@ static int simplify_symbols(struct module
>> *mod, const struct load_info *info)
>> break;
>> default:
>> + if (sym[i].st_shndx >= info->hdr->e_shnum) {
>> + pr_err("%s: Symbol %s has an invalid section index
>> %u (max %u)\n",
>> + mod->name, name, sym[i].st_shndx,
>> info->hdr->e_shnum - 1);
>> + ret = -ENOEXEC;
>> + break;
>> + }
>> +
>> /* Divert to percpu allocation if a percpu var. */
>> if (sym[i].st_shndx == info->index.pcpu)
>> secbase = (unsigned long)mod_percpu(mod);
>
> I tried both llvm21 and llvm22 (where llvm21 is used in bpf ci).
>
> Without KASAN, I can reproduce the failure for llvm19/llvm21/llvm22.
> I did not test llvm20 and I assume it may fail too.
>
> The following llvm patch
> https://github.com/llvm/llvm-project/pull/170462
> can fix the issue. Currently it is still in review stage. The actual
> diff is
>
> diff --git a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> index e5de17e093df..cc1527d996e2 100644
> --- a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> +++ b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
> @@ -2168,7 +2168,11 @@ Error Object::updateSectionData(SecPtr &Sec,
> ArrayRef<uint8_t> Data) {
> Data.size(), Sec->Name.c_str(), Sec->Size);
>
> if (!Sec->ParentSegment) {
> - Sec = std::make_unique<OwnedDataSection>(*Sec, Data);
> + SectionBase *Replaced = Sec.get();
> + SectionBase *Modified = &addSection<OwnedDataSection>(*Sec, Data);
> + DenseMap<SectionBase *, SectionBase *> Replacements{{Replaced,
> Modified}};
> + if (auto err = replaceSections(Replacements))
> + return err;
> } else {
> // The segment writer will be in charge of updating these contents.
> Sec->Size = Data.size();
>
> I applied the above patch to latest llvm21 and llvm22 and
> the crash is gone and the selftests can run properly.
>
> With KASAN, everything is okay for llvm21 and llvm22.
>
> Not sure whether the llvm patch
> https://github.com/llvm/llvm-project/pull/170462
> can make into llvm21 or not as looks like llvm21 intends to
> freeze for now. See
> https://github.com/llvm/llvm-project/pull/168314#issuecomment-3645797175
> the llvm22 will branch into rc mode in January.
>
> I will try to see whether we can have a reasonable workaround
> for llvm21 llvm-objcopy (for without KASAN).
>
I commented the llvm patch https://github.com/llvm/llvm-project/pull/170462
and hopefully the fix can land soon.
I didn't find a good solution. Currently if there are kfunc's in the module,
.BTF_ids section will be created. Previously, resolve_btfids will resolve
.BTF_ids such that the count and btf id will be resolved by resolve_btfids
itself.
The current approach, resolve_btfids will not populate the *correct* contents
to .BTF_ids section. Rather it created another file and try to do
update-section. This should work. But it may not work due to the llvm bug.
One possible workaround is in resolve_btfids, the .BTF_ids section is populated
correct contents and remove update-section for .BTF_ids.
Powered by blists - more mailing lists