[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250117084038.79f40307@gandalf.local.home>
Date: Fri, 17 Jan 2025 08:40:38 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Alexandre Ferrieux <alexandre.ferrieux@...il.com>
Cc: linux-trace-users@...r.kernel.org, Lorenzo Stoakes
<lorenzo.stoakes@...cle.com>, LKML <linux-kernel@...r.kernel.org>,
linux-mm@...ck.org
Subject: Re: Bug: broken /proc/kcore in 6.13
[ Cc'ing the proper folks ]
-- Steve
On Fri, 17 Jan 2025 11:36:05 +0100
Alexandre Ferrieux <alexandre.ferrieux@...il.com> wrote:
> Hi,
>
> Somewhere in the 6.13 branch (not bisected yet, sorry), it stopped being
> possible to disassemble the running kernel from gdb through /proc/kcore.
>
> More precisely:
>
> - look up a function in /proc/kallsyms => 0xADDRESS
> - tell gdb to "core /proc/kcore"
> - tell gdb to "disass 0xADDRESS,+LENGTH" (no need for a symbol table)
>
> * if the function is within the main kernel text, it is okay
> * if the function is within a module's text, an infinite loop happens:
>
>
> Example:
>
> # egrep -w ice_process_skb_fields\|ksys_write /proc/kallsyms
> ffffffffaf296c80 T ksys_write
> ffffffffc0b67180 t ice_process_skb_fields [ice]
>
> # gdb -ex "core /proc/kcore" -ex "disass 0xffffffffaf296c80,+256" -ex quit
> ...
> Dump of assembler code from 0xffffffffaf296c80 to 0xffffffffaf296d80:
> ...
> End of assembler dump.
>
> # gdb -ex "core /proc/kcore" -ex "disass 0xffffffffc0b67180,+256" -ex quit
> ...
> Dump of assembler code from 0xffffffffc0b67180 to 0xffffffffc0b67280:
> (***NOTHING***)
> ^C <= inefficient, need kill -9
>
>
> Ftrace (see below) shows in this case read_kcore_iter() calls vread_iter() in an
> infinite loop:
>
> while (true) {
> read += vread_iter(iter, src, left);
> if (read == tsz)
> break;
>
> src += read;
> left -= read;
>
> if (fault_in_iov_iter_writeable(iter, left)) {
> ret = -EFAULT;
> goto out;
> }
> }
>
> As it turns out, in the offending situation, vread_iter() keeps returning 0,
> with "read" staying at its initial value of 0, and "tsz" nonzero. As a
> consequence, "src" stays stuck in a place where vread_iter() fails.
>
> A cursory "git blame" shows that this interplay (vread_iter() legitimately
> returning zero, and read_kcore_iter() *not* testing it) has been there from
> quite some time. So, while this is arguably fragile, possibly the new situation
> lies in the actual memory layout that triggers the failing path.
>
> Thanks for any insight, as this completely breaks debugging the running kernel
> in 6.13.
>
> -Alex
>
>
> ------------
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 0/0 #P:48
> #
> # TASK-PID CPU# TIMESTAMP FUNCTION
> # | | | | |
> <...>-3304 [045] 487.295283: kprobe_read_kcore_iter:
> (read_kcore_iter+0x4/0xae0) pos=0x7fffc0b6b000
> <...>-3304 [045] 487.295298: kprobe_vread_iter:
> (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> <...>-3304 [045] 487.295326: kretprobe_vread_iter:
> (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> <...>-3304 [045] 487.295329: kprobe_vread_iter:
> (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> <...>-3304 [045] 487.295338: kretprobe_vread_iter:
> (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> <...>-3304 [045] 487.295339: kprobe_vread_iter:
> (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> <...>-3304 [045] 487.295345: kretprobe_vread_iter:
> (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> <...>-3304 [045] 487.295347: kprobe_vread_iter:
> (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> <...>-3304 [045] 487.295352: kretprobe_vread_iter:
> (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> <...>-3304 [045] 487.295353: kprobe_vread_iter:
> (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> ...
>
Powered by blists - more mailing lists