linux-kernel - Re: Crash with 6.6.0-rc1-rt1 and several i915 locking call traces with v6.5.2-rt8 and gnome-shell on Alder Lake laptop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230922110720.AZ03l3A9@linutronix.de>
Date:   Fri, 22 Sep 2023 13:07:20 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     "John B. Wyatt IV" <jwyatt@...hat.com>
Cc:     linux-rt-users@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        kernel-rts-sst <kernel-rts-sst@...hat.com>, jlelli@...hat.com
Subject: Re: Crash with 6.6.0-rc1-rt1 and several i915 locking call traces
 with v6.5.2-rt8 and gnome-shell on Alder Lake laptop

On 2023-09-20 18:07:35 [-0400], John B. Wyatt IV wrote:
> Hello everyone,
Hi,

> While backporting i915 fixes to the RHEL9 kernel for a similar looking
> issue; I noticed the commits that worked for RHEL8 did not work for RHEL9.
> 
> Testing the (almost) latest release: 6.5.2-rt8; showed a lot of call traces
> on RHEL9. [1] being the most common one and it repeats itself on suspend.

A warn-once might help to reduce them so they can be worked on one by
one.

> [2] was the second one to show and seems to be the second most common
> call trace. This was tested on a Framework Alder Lake laptop with i915
> graphics. There was a total of 36 call traces before suspend and
> additional 12 after suspend (once again, [1]).
> 
> When I tested on 6.6.0-rc1-rt1 the kernel crashed on boot. I did not
> have a way to pull the information and was transcribed manually. [3]
> 
> [1]

Both:
> BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
> preempt_count: 0, expected: 0
> RCU nest depth: 6, expected: 0
> 12 locks held by gnome-shell/6590:
…
> BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
> preempt_count: 0, expected: 0
> RCU nest depth: 5, expected: 0

are might-sleep splats. I don't see these on my notebook/desktop on
6.6-rc. I don't remember doing suspend on 6.5 notebook but I did that on
my desktop for testing.
It looks like due to "locks" the RCU is > 0 and then the splat triggers
because it assumes that it will schedule-out which is okay on RT. But
then it is not okay for the ww-mutex to do so I am a little confused if
this is RT only problem or also not RT. But maybe it is just a try-lock
and the warning is just wrongly triggered…

> [3]
> 
> general protection fault, probably for non-canonical address 0xdffffc0004: 0000(#1) PREEMPT_RT SMP KASAN NOPRI
> KASAM: null-ptr-deref in range [0x000...20-0x000...27]
> RIP: 0010:ucsi_debugfs_unregister (drivers/usb/typec/ucsi/debugfs.c:87) 
> [snipped]
> PKRU: 5555554
> Call Trace:
> <TASK>
> usci_destroy+0xe/0x20 
> ucsi_acpi_probe (drivers/usb/typec/ucsi/ucsi_acpi.c:207) 

This is odd. That means that ucsi_register() failed and debugfs was
setup and is NULL. And check in line 87 checks ucsi which is non-NULL
and the ucsi->debugfs is NULL. So it should return but somehow it does
this. Does this also trigger without KASAN?

In the meantime let me try to enable KASAN…

Sebastian