[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aQy_TikwHgkb9iNA@fedora>
Date: Thu, 6 Nov 2025 07:31:26 -0800
From: "Vishal Moola (Oracle)" <vishal.moola@...il.com>
To: Jan Kiszka <jan.kiszka@...mens.com>
Cc: linux-kernel@...r.kernel.org, Kieran Bingham <kbingham@...nel.org>
Subject: Re: GDB causing OOPS on insmod
On Thu, Nov 06, 2025 at 07:07:28AM +0100, Jan Kiszka wrote:
> On 06.11.25 00:26, Vishal Moola (Oracle) wrote:
> > I'm on a x86 defconfig + GDB_SCRIPTS + DEBUG_VM + PAGE_OWNER kernel. Running
> > 'lx-symbols' in gdb Before loading modules causes the kernel to OOPS on
> > module load:
> >
> > [ 13.627373] BUG: kernel NULL pointer dereference, address: 0000000000000900
> > [ 13.627376] #PF: supervisor write access in kernel mode
> > [ 13.627377] #PF: error_code(0x0002) - not-present page
> > [ 13.627378] PGD 0 P4D 0
> > [ 13.627379] Oops: Oops: 0002 [#1] SMP PTI
> > [ 13.627383] CPU: 0 UID: 0 PID: 279 Comm: insmod Not tainted 6.18.0-rc3+ #163 PREEMPT(voluntary)
> > [ 13.627384] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-6.fc43 04/01/2014
> > [ 13.627385] RIP: 0010:__kernel_read+0x210/0x2f0
> > [ 13.627390] Code: 00 40 0f 84 bd 00 00 00 48 3b 7f 18 0f 84 c3 00 00 00 48 89 f2 b9 02 00 00 00 44 89 d6 e8 78 6c 06 00 4d 01 ac 24 f0 08 00 00 <49> 83 84 24 00 09 00 00 01 48 8b 45 e0 65 48 2b 05 53 38 c7 01 0f
> > [ 13.627391] RSP: 0018:ffffc900002f7c68 EFLAGS: 00010246
> > [ 13.627393] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > [ 13.627393] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff82d47e70
> > [ 13.627394] RBP: 00000000002f7cf8 R08: 0000000000000000 R09: 0000000000000000
> > [ 13.627394] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> > [ 13.627395] R13: 0000000000000000 R14: ffffc900002f7d10 R15: ffffc900002f7d10
> > [ 13.627399] FS: 00007f704851c740(0000) GS:ffff8880bba45000(0000) knlGS:0000000000000000
> > [ 13.627401] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 13.627402] CR2: 0000000000000900 CR3: 00000000095f2000 CR4: 00000000000006f0
> > [ 13.627406] Call Trace:
> > [ 13.627407] <TASK>
> > [ 13.627409] ? init_module_from_file+0x92/0xd0
> > [ 13.627412] ? init_module_from_file+0x92/0xd0
> > [ 13.627414] ? idempotent_init_module+0x109/0x2f0
> > [ 13.627416] ? __x64_sys_finit_module+0x60/0xb0
> > [ 13.627418] ? x64_sys_call+0x1a74/0x1da0
> > [ 13.627421] ? do_syscall_64+0xa4/0x290
> > [ 13.627429] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > [ 13.627431] </TASK>
> > [ 13.627431] Modules linked in: test_xarray
> > [ 13.627433] CR2: 0000000000000900
> > [ 13.627434] ---[ end trace 0000000000000000 ]---
> > [ 13.627435] RIP: 0010:__kernel_read+0x210/0x2f0
> > [ 13.627437] Code: 00 40 0f 84 bd 00 00 00 48 3b 7f 18 0f 84 c3 00 00 00 48 89 f2 b9 02 00 00 00 44 89 d6 e8 78 6c 06 00 4d 01 ac 24 f0 08 00 00 <49> 83 84 24 00 09 00 00 01 48 8b 45 e0 65 48 2b 05 53 38 c7 01 0f
> > [ 13.627438] RSP: 0018:ffffc900002f7c68 EFLAGS: 00010246
> > [ 13.627439] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > [ 13.627439] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff82d47e70
> > [ 13.627440] RBP: 00000000002f7cf8 R08: 0000000000000000 R09: 0000000000000000
> > [ 13.627440] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> > [ 13.627440] R13: 0000000000000000 R14: ffffc900002f7d10 R15: ffffc900002f7d10
> > [ 13.627442] FS: 00007f704851c740(0000) GS:ffff8880bba45000(0000) knlGS:0000000000000000
> > [ 13.627444] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 13.627445] CR2: 0000000000000900 CR3: 00000000095f2000 CR4: 00000000000006f0
> >
> > I used module test_xarray for the purpose of demonstration, but this does
> > happen with all modules.
> >
> > I have no clue what patch caused this, or when this bug was introduced.
> > I played around with the scripts a bit and found the diff below eliminates
> > this issue entirely:
> >
> > diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
> > index 6edb99221675..8b507907e044 100644
> > --- a/scripts/gdb/linux/symbols.py
> > +++ b/scripts/gdb/linux/symbols.py
> > @@ -44,8 +44,7 @@ if hasattr(gdb, 'Breakpoint'):
> > "'{0}'\n".format(module_name))
> > cmd.load_all_symbols()
> > else:
> > - cmd.load_module_symbols(module)
> > -
> > + cmd.load_all_symbols()
> > return False
> >
> > Does anyone know what's going on here? And is this the fix we should upstream?
>
> Are you using kvm or tcg with qemu? Is the issue gone when switching the
> accelerator mode?
I'm using kvm. Switching to tcg works, I hadn't thought to do that :)
kvm is definitely faster though, so support for that is my preferred
option.
> And when do you attach to the kernel here? System booted, idle, attach,
> continue, load (another) module?
I've tried attaching at all those mentioned points, it always Kills
whatever module I attmept to load after attachting gdb and running
lx-symbols. Aka I do not run into this if I never run gdb, or detach gdb
before loading a module.
Powered by blists - more mailing lists