[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4fd7e1a3-f7ff-4b9d-9a53-fb73795b5b3d@lucifer.local>
Date: Fri, 17 Jan 2025 14:44:04 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Alexandre Ferrieux <alexandre.ferrieux@...il.com>,
linux-trace-users@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
linux-mm@...ck.org
Subject: Re: Bug: broken /proc/kcore in 6.13
On Fri, Jan 17, 2025 at 08:40:38AM -0500, Steven Rostedt wrote:
>
> [ Cc'ing the proper folks ]
>
> -- Steve
Thanks Steve!
>
>
> On Fri, 17 Jan 2025 11:36:05 +0100
> Alexandre Ferrieux <alexandre.ferrieux@...il.com> wrote:
>
> > Hi,
> >
> > Somewhere in the 6.13 branch (not bisected yet, sorry), it stopped being
> > possible to disassemble the running kernel from gdb through /proc/kcore.
Thanks for the report! Much appreciated.
I may try to bisect here also unless you're close to finding the commit that
broke this?
> >
> > More precisely:
> >
> > - look up a function in /proc/kallsyms => 0xADDRESS
> > - tell gdb to "core /proc/kcore"
> > - tell gdb to "disass 0xADDRESS,+LENGTH" (no need for a symbol table)
> >
> > * if the function is within the main kernel text, it is okay
> > * if the function is within a module's text, an infinite loop happens:
> >
> >
> > Example:
> >
> > # egrep -w ice_process_skb_fields\|ksys_write /proc/kallsyms
> > ffffffffaf296c80 T ksys_write
> > ffffffffc0b67180 t ice_process_skb_fields [ice]
> >
> > # gdb -ex "core /proc/kcore" -ex "disass 0xffffffffaf296c80,+256" -ex quit
> > ...
> > Dump of assembler code from 0xffffffffaf296c80 to 0xffffffffaf296d80:
> > ...
> > End of assembler dump.
> >
> > # gdb -ex "core /proc/kcore" -ex "disass 0xffffffffc0b67180,+256" -ex quit
> > ...
> > Dump of assembler code from 0xffffffffc0b67180 to 0xffffffffc0b67280:
> > (***NOTHING***)
> > ^C <= inefficient, need kill -9
> >
> >
> > Ftrace (see below) shows in this case read_kcore_iter() calls vread_iter() in an
> > infinite loop:
> >
> > while (true) {
> > read += vread_iter(iter, src, left);
> > if (read == tsz)
> > break;
> >
> > src += read;
> > left -= read;
> >
> > if (fault_in_iov_iter_writeable(iter, left)) {
> > ret = -EFAULT;
> > goto out;
> > }
> > }
> >
> > As it turns out, in the offending situation, vread_iter() keeps returning 0,
> > with "read" staying at its initial value of 0, and "tsz" nonzero. As a
> > consequence, "src" stays stuck in a place where vread_iter() fails.
> >
Yikes, this is my fault. Sorry about that!
There was some discussion at the time about the infinite loop, obviously with
the understanding that vread_iter() should never return 0 in this scenario
(where we had identified the _category_ of kernel memory being accessed), which
is obviously now rendered false.
The fact that it can is (obviously) rather problematic... obviously we need to
patch this, if this were possible in real scenarios in the past we would
probably also want to backport a fix.
In any case I think we need an explicit check here no matter the cause so we can
never loop like this. This was just an oversight at the time given this is a
documented behaviour.
My instinct is to error out if this returns 0, because that would indicate that
the address is not part of the vmalloc area.
But then it seems add_modules_range() is just adding the module range under
category KCORE_VMALLOC despite it not being in the vmalloc range :/ which is
really odd. This was added a long time ago so clearly not what triggered this
but odd.
In any case, let me go have a look at this...
> > A cursory "git blame" shows that this interplay (vread_iter() legitimately
> > returning zero, and read_kcore_iter() *not* testing it) has been there from
> > quite some time. So, while this is arguably fragile, possibly the new situation
> > lies in the actual memory layout that triggers the failing path.
> >
> > Thanks for any insight, as this completely breaks debugging the running kernel
> > in 6.13.
Apologies again. Let's figure this out and get this fixed!
Cheers, Lorenzo
> >
> > -Alex
> >
> >
> > ------------
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 0/0 #P:48
> > #
> > # TASK-PID CPU# TIMESTAMP FUNCTION
> > # | | | | |
> > <...>-3304 [045] 487.295283: kprobe_read_kcore_iter:
> > (read_kcore_iter+0x4/0xae0) pos=0x7fffc0b6b000
> > <...>-3304 [045] 487.295298: kprobe_vread_iter:
> > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> > <...>-3304 [045] 487.295326: kretprobe_vread_iter:
> > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> > <...>-3304 [045] 487.295329: kprobe_vread_iter:
> > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> > <...>-3304 [045] 487.295338: kretprobe_vread_iter:
> > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> > <...>-3304 [045] 487.295339: kprobe_vread_iter:
> > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> > <...>-3304 [045] 487.295345: kretprobe_vread_iter:
> > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> > <...>-3304 [045] 487.295347: kprobe_vread_iter:
> > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> > <...>-3304 [045] 487.295352: kretprobe_vread_iter:
> > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0
> > <...>-3304 [045] 487.295353: kprobe_vread_iter:
> > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384
> > ...
> >
>
Powered by blists - more mailing lists