[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250225025631.GA271248@ax162>
Date: Mon, 24 Feb 2025 18:56:31 -0800
From: Nathan Chancellor <nathan@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, Masami Hiramatsu <mhiramat@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Andrew Morton <akpm@...ux-foundation.org>,
bpf <bpf@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Masahiro Yamada <masahiroy@...nel.org>,
Nicolas Schier <nicolas@...sle.eu>,
Zheng Yejian <zhengyejian1@...wei.com>,
Martin Kelly <martin.kelly@...wdstrike.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>
Subject: Re: [for-next][PATCH 4/6] scripts/sorttable: Zero out weak functions
in mcount_loc table
Hi Steve,
On Wed, Feb 19, 2025 at 10:18:19AM -0500, Steven Rostedt wrote:
> When a function is annotated as "weak" and is overridden, the code is not
> removed. If it is traced, the fentry/mcount location in the weak function
> will be referenced by the "__mcount_loc" section. This will then be added
> to the available_filter_functions list. Since only the address of the
> functions are listed, to find the name to show, a search of kallsyms is
> used.
>
> Since kallsyms will return the function by simply finding the function
> that the address is after but before the next function, an address of a
> weak function will show up as the function before it. This is because
> kallsyms does not save names of weak functions. This has caused issues in
> the past, as now the traced weak function will be listed in
> available_filter_functions with the name of the function before it.
>
> At best, this will cause the previous function's name to be listed twice.
> At worse, if the previous function was marked notrace, it will now show up
> as a function that can be traced. Note that it only shows up that it can
> be traced but will not be if enabled, which causes confusion.
>
> https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/
>
> The commit b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid
> adding weak function") was a workaround to this by checking the function
> address before printing its name. If the address was too far from the
> function given by the name then instead of printing the name it would
> print: __ftrace_invalid_address___<invalid-offset>
>
> The real issue is that these invalid addresses are listed in the ftrace
> table look up which available_filter_functions is derived from. A place
> holder must be listed in that file because set_ftrace_filter may take a
> series of indexes into that file instead of names to be able to do O(1)
> lookups to enable filtering (many tools use this method).
>
> Even if kallsyms saved the size of the function, it does not remove the
> need of having these place holders. The real solution is to not add a weak
> function into the ftrace table in the first place.
>
> To solve this, the sorttable.c code that sorts the mcount regions during
> the build is modified to take a "nm -S vmlinux" input, sort it, and any
> function listed in the mcount_loc section that is not within a boundary of
> the function list given by nm is considered a weak function and is zeroed
> out.
>
> Note, this does not mean they will remain zero when booting as KASLR
> will still shift those addresses. To handle this, the entries in the
> mcount_loc section will be ignored if they are zero or match the
> kaslr_offset() value.
>
> Before:
>
> ~# grep __ftrace_invalid_address___ /sys/kernel/tracing/available_filter_functions | wc -l
> 551
>
> After:
>
> ~# grep __ftrace_invalid_address___ /sys/kernel/tracing/available_filter_functions | wc -l
> 0
I am also seeing a crash when booting arm64 with certain configurations
that I don't see at the parent change.
$ printf 'CONFIG_%s=y\n' FTRACE FUNCTION_TRACER >kernel/configs/repro.config
$ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux- mrproper virtconfig repro.config Image.gz
$ qemu-system-aarch64 \
-display none \
-nodefaults \
-cpu max,pauth-impdef=true \
-machine virt,gic-version=max,virtualization=true \
-append 'console=ttyAMA0 earlycon' \
-kernel arch/arm64/boot/Image.gz \
-initrd rootfs.cpio \
-m 512m \
-serial mon:stdio
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510]
[ 0.000000] Linux version 6.14.0-rc4-next-20250224-dirty (nathan@...62) (aarch64-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP PREEMPT Mon Feb 24 18:47:59 PST 2025
...
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] kernel BUG at arch/arm64/kernel/patching.c:39!
[ 0.000000] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc4-next-20250224-dirty #1
[ 0.000000] Hardware name: linux,dummy-virt (DT)
[ 0.000000] pstate: 000000c9 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : patch_map.constprop.0+0xfc/0x108
[ 0.000000] lr : patch_map.constprop.0+0x3c/0x108
[ 0.000000] sp : ffff96c0b6fa3ce0
[ 0.000000] x29: ffff96c0b6fa3ce0 x28: ffff96c0b6faafd0 x27: 00000000000000ff
[ 0.000000] x26: fff9f3a0c2408080 x25: 0000000000000001 x24: fff9f3a0c2408000
[ 0.000000] x23: 0000000000000000 x22: ffff96c0b72391d8 x21: 00000000000000c0
[ 0.000000] x20: 000016c035400000 x19: 000016c035400000 x18: 00000000f0000000
[ 0.000000] x17: 0000000000000068 x16: 0000000000000100 x15: ffff96c0b6fa39c4
[ 0.000000] x14: 0000000000000008 x13: 0000000000000000 x12: ffffe9ce43090280
[ 0.000000] x11: fff9f3a0dfef80c8 x10: ffffe9ce43090288 x9 : 0000000000000000
[ 0.000000] x8 : fff9f3a0dfef80b8 x7 : fffa5ce02929a000 x6 : ffff96c0b6fa39d0
[ 0.000000] x5 : 0000000000000030 x4 : 0000000000000000 x3 : ffff96c0b69b4000
[ 0.000000] x2 : ffff96c0b69b4000 x1 : 0000000000000000 x0 : 0000000000000000
[ 0.000000] Call trace:
[ 0.000000] patch_map.constprop.0+0xfc/0x108 (P)
[ 0.000000] aarch64_insn_write_literal_u64+0x38/0x80
[ 0.000000] ftrace_init_nop+0x40/0xe0
[ 0.000000] ftrace_process_locs+0x2a8/0x530
[ 0.000000] ftrace_init+0x60/0x130
[ 0.000000] start_kernel+0x4ac/0x708
[ 0.000000] __primary_switched+0x88/0x98
[ 0.000000] Code: d1681000 a8c27bfd d50323bf d65f03c0 (d4210000)
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
I see the same crash with clang (after applying your suggested fix for
the issue that Arnd brought up).
[ 0.000000] Unable to handle kernel paging request at virtual address 00001cb7f7800008
[ 0.000000] Mem abort info:
[ 0.000000] ESR = 0x000000009600002b
[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.000000] SET = 0, FnV = 0
[ 0.000000] EA = 0, S1PTW = 0
[ 0.000000] FSC = 0x2b: level -1 translation fault
[ 0.000000] Data abort info:
[ 0.000000] ISV = 0, ISS = 0x0000002b, ISS2 = 0x00000000
[ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 0.000000] [00001cb7f7800008] user address but active_mm is swapper
[ 0.000000] Internal error: Oops: 000000009600002b [#1] PREEMPT SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc4-next-20250224-dirty #1
[ 0.000000] Hardware name: linux,dummy-virt (DT)
[ 0.000000] pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : ftrace_call_adjust+0x44/0xd0
[ 0.000000] lr : ftrace_process_locs+0x1e0/0x560
[ 0.000000] sp : ffff9cb878f93da0
[ 0.000000] x29: ffff9cb878f93da0 x28: ffff9cb879234000 x27: ffff9cb879234000
[ 0.000000] x26: 00001cb7f7800000 x25: ffff9cb878ed8578 x24: fffac24642008000
[ 0.000000] x23: ffff9cb878f3cf90 x22: fffac24642008000 x21: 0000000000000000
[ 0.000000] x20: 0000000000001000 x19: 00001cb7f7800000 x18: 0000000000000068
[ 0.000000] x17: 0000000000000002 x16: 00000000fffffffe x15: ffff9cb878fa58c0
[ 0.000000] x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000000
[ 0.000000] x11: 0000000000000000 x10: 0000000000000000 x9 : 00007fff80000000
[ 0.000000] x8 : 000000000000201f x7 : 0000000000000000 x6 : 6d6067666871ff73
[ 0.000000] x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000001
[ 0.000000] x2 : 0000000000000004 x1 : 0000000000000040 x0 : 00001cb7f7800000
[ 0.000000] Call trace:
[ 0.000000] ftrace_call_adjust+0x44/0xd0 (P)
[ 0.000000] ftrace_process_locs+0x1e0/0x560
[ 0.000000] ftrace_init+0x7c/0xc8
[ 0.000000] start_kernel+0x160/0x3b8
[ 0.000000] __primary_switched+0x88/0x98
[ 0.000000] Code: aa1f03e0 14000014 aa0003f3 528403e8 (b8408e74)
If there is any other information I can provide or patches I can test, I
am more than happy to do so.
Cheers,
Nathan
Powered by blists - more mailing lists