[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5740a4d2-4402-451a-83bd-875331b5de11@amd.com>
Date: Tue, 5 Sep 2023 14:25:35 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Dave Hansen <dave.hansen@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Saranya Gopal <saranya.gopal@...el.com>,
Rajaram Regupathy <rajaram.regupathy@...el.com>,
Heikki Krogerus <heikki.krogerus@...ux.intel.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Uwe Kleine-König <u.kleine-koenig@...gutronix.de>,
Wayne Chang <waynec@...dia.com>,
Hans de Goede <hdegoede@...hat.com>,
Neil Armstrong <neil.armstrong@...aro.org>,
linux-usb@...r.kernel.org
Subject: Re: ucsi debugfs oops (current Linus pre-6.6-rc1)
On 9/5/2023 14:10, Dave Hansen wrote:
> I'm having some problems booting Linus's current tree. It seems to have
> happened in some content between commit 3f86ed6ec0b3 and df0383ffad.
>
> I'm suspecting this commit:
>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df0383ffad64dc09954a60873c1e202b47f08d90
>
> I'm seeing a null pointer oops on this line:
>
> void ucsi_debugfs_unregister(struct ucsi *ucsi)
> {
> ===> debugfs_remove_recursive(ucsi->debugfs->dentry);
> kfree(ucsi->debugfs);
> }
>
> on this instruction:
>
> 66 0f 1f 00 nop WORD PTR [rax]
> 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
> 53 push rbx
> 48 8b 47 38 mov rax,QWORD PTR [rdi+0x38]
> 48 89 fb mov rbx,rdi
> => 48 8b 78 20 mov rdi,QWORD PTR [rax+0x20]
> e8 36 16 26 e1 call 0xffffffffe1261669
> 48 8b 7b 38 mov rdi,QWORD PTR [rbx+0x38]
> 5b pop rbx
> e9 5c 79 03 e1 jmp 0xffffffffe1037999
>
> That's the second dereference in the function, so I assume this is
> trying to dereference 'debugfs' above. It appears that this is some
> failure/error path out of ucsi_acpi_probe() that's not handled correctly.
>
> Probably this:
>
>> if (ACPI_FAILURE(status)) {
>> dev_err(&pdev->dev, "failed to install notify handler\n");
>> ucsi_destroy(ua->ucsi);
>> return -ENODEV;
>> }
>>
>> ret = ucsi_register(ua->ucsi);
>
> where ucsi_destroy() is called before ucsi_register(). Although I do
> _not_ see the dev_err() message anywhere.
If your theory is right could it be that the printk handler was racing
and that's why it didn't come up?
In any case I'd think you can add this to ucsi_debugfs_unregister() to
avoid it.
if (!ucsi->debugfs)
return;
>
> Full oops is below.
>
> I'll try putting some hacks in place to avoid the null pointer. Also,
> please forgive the lack of a bisect for the moment. This is happening
> on my main laptop and it's a mild pain to do bisects on here.
>
>> [ 4.903493] BUG: kernel NULL pointer dereference, address: 0000000000000020^M
>> [ 4.905624] #PF: supervisor read access in kernel mode^M
>> [ 4.907326] #PF: error_code(0x0000) - not-present page^M
>> [ 4.908993] PGD 0 P4D 0 ^M
>> [ 4.910998] Oops: 0000 [#1] PREEMPT SMP NOPTI^M
>> [ 4.913077] CPU: 6 PID: 150 Comm: systemd-udevd Not tainted 6.5.0-11704-g3f86ed6ec0b3 #138^M
>> [ 4.915211] Hardware name: Framework Laptop/FRANBMCP0B, BIOS 03.10 07/19/2022^M
>> [ 4.917355] RIP: 0010:ucsi_debugfs_unregister+0x11/0x30 [typec_ucsi]^M
>> [ 4.919705] Code: 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 53 48 8b 47 38 48 89 fb <48> 8b 78 20 e8 36 16 26 e1 48 8b 7b 38 5b e9 5c 79 03 e1 66 66 2e^M
>> [ 4.921982] RSP: 0018:ffffc900007e7bb8 EFLAGS: 00010246^M
>> [ 4.924227] RAX: 0000000000000000 RBX: ffff888101b2be00 RCX: 0000000000009a06^M
>> [ 4.926752] RDX: 0000000000000000 RSI: ffff888104491798 RDI: ffff888101b2be00^M
>> [ 4.929312] RBP: ffff888101b2be00 R08: 0000000000009906 R09: 00000000000333f0^M
>> [ 4.931887] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffed^M
>> [ 4.934451] R13: ffff888102594810 R14: ffff888100653600 R15: ffff888101fa7f78^M
>> [ 4.937115] FS: 00007f5dd0fb48c0(0000) GS:ffff88906fb80000(0000) knlGS:0000000000000000^M
>> [ 4.939581] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
>> [ 4.941308] CR2: 0000000000000020 CR3: 0000000105070005 CR4: 0000000000f70ee0^M
>> [ 4.943022] PKRU: 55555554^M
>> [ 4.944731] Call Trace:^M
>> [ 4.946438] <TASK>^M
>> [ 4.948167] ? __die+0x24/0x70^M
>> [ 4.949864] ? page_fault_oops+0x15b/0x440^M
>> [ 4.951563] ? acpi_evaluate_object+0x190/0x2f0^M
>> [ 4.953201] ? _raw_spin_lock_irqsave+0x28/0x50^M
>> [ 4.954841] ? exc_page_fault+0x6e/0x160^M
>> [ 4.956461] ? asm_exc_page_fault+0x26/0x30^M
>> [ 4.958067] ? ucsi_debugfs_unregister+0x11/0x30 [typec_ucsi]^M
>> [ 4.959677] ucsi_destroy+0x12/0x20 [typec_ucsi]^M
>> [ 4.961298] ucsi_acpi_probe+0x1cc/0x230 [ucsi_acpi]^M
>> [ 4.962908] platform_probe+0x40/0xb0^M
>> [ 4.964522] really_probe+0x1a2/0x410^M
>> [ 4.966110] __driver_probe_device+0x78/0x160^M
>> [ 4.967735] driver_probe_device+0x1e/0x90^M
>> [ 4.969306] __driver_attach+0xd6/0x1d0^M
>> [ 4.970874] ? __pfx___driver_attach+0x10/0x10^M
>> [ 4.972449] bus_for_each_dev+0x79/0xd0^M
>> [ 4.974022] bus_add_driver+0x116/0x220^M
>> [ 4.975600] driver_register+0x60/0x120^M
>> [ 4.977169] ? __pfx_ucsi_acpi_platform_driver_init+0x10/0x10 [ucsi_acpi]^M
>> [ 4.978762] do_one_initcall+0x45/0x220^M
>> [ 4.980367] ? kmalloc_trace+0x29/0x90^M
>> [ 4.981952] do_init_module+0x90/0x260^M
>> [ 4.983530] init_module_from_file+0x8b/0xd0^M
>> [ 4.985087] idempotent_init_module+0x181/0x240^M
>> [ 4.986639] __x64_sys_finit_module+0x5e/0xb0^M
>> [ 4.988198] do_syscall_64+0x3c/0x90^M
>> [ 4.989739] entry_SYSCALL_64_after_hwframe+0x6e/0xd8^M
>> [ 4.991290] RIP: 0033:0x7f5dd16aaa3d^M
Powered by blists - more mailing lists