[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230922110720.AZ03l3A9@linutronix.de>
Date: Fri, 22 Sep 2023 13:07:20 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: "John B. Wyatt IV" <jwyatt@...hat.com>
Cc: linux-rt-users@...r.kernel.org,
LKML <linux-kernel@...r.kernel.org>,
kernel-rts-sst <kernel-rts-sst@...hat.com>, jlelli@...hat.com
Subject: Re: Crash with 6.6.0-rc1-rt1 and several i915 locking call traces
with v6.5.2-rt8 and gnome-shell on Alder Lake laptop
On 2023-09-20 18:07:35 [-0400], John B. Wyatt IV wrote:
> Hello everyone,
Hi,
> While backporting i915 fixes to the RHEL9 kernel for a similar looking
> issue; I noticed the commits that worked for RHEL8 did not work for RHEL9.
>
> Testing the (almost) latest release: 6.5.2-rt8; showed a lot of call traces
> on RHEL9. [1] being the most common one and it repeats itself on suspend.
A warn-once might help to reduce them so they can be worked on one by
one.
> [2] was the second one to show and seems to be the second most common
> call trace. This was tested on a Framework Alder Lake laptop with i915
> graphics. There was a total of 36 call traces before suspend and
> additional 12 after suspend (once again, [1]).
>
> When I tested on 6.6.0-rc1-rt1 the kernel crashed on boot. I did not
> have a way to pull the information and was transcribed manually. [3]
>
> [1]
Both:
> BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
> preempt_count: 0, expected: 0
> RCU nest depth: 6, expected: 0
> 12 locks held by gnome-shell/6590:
…
> BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
> preempt_count: 0, expected: 0
> RCU nest depth: 5, expected: 0
are might-sleep splats. I don't see these on my notebook/desktop on
6.6-rc. I don't remember doing suspend on 6.5 notebook but I did that on
my desktop for testing.
It looks like due to "locks" the RCU is > 0 and then the splat triggers
because it assumes that it will schedule-out which is okay on RT. But
then it is not okay for the ww-mutex to do so I am a little confused if
this is RT only problem or also not RT. But maybe it is just a try-lock
and the warning is just wrongly triggered…
> [3]
>
> general protection fault, probably for non-canonical address 0xdffffc0004: 0000(#1) PREEMPT_RT SMP KASAN NOPRI
> KASAM: null-ptr-deref in range [0x000...20-0x000...27]
> RIP: 0010:ucsi_debugfs_unregister (drivers/usb/typec/ucsi/debugfs.c:87)
> [snipped]
> PKRU: 5555554
> Call Trace:
> <TASK>
> usci_destroy+0xe/0x20
> ucsi_acpi_probe (drivers/usb/typec/ucsi/ucsi_acpi.c:207)
This is odd. That means that ucsi_register() failed and debugfs was
setup and is NULL. And check in line 87 checks ucsi which is non-NULL
and the ucsi->debugfs is NULL. So it should return but somehow it does
this. Does this also trigger without KASAN?
In the meantime let me try to enable KASAN…
Sebastian
Powered by blists - more mailing lists