lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230922110720.AZ03l3A9@linutronix.de>
Date:   Fri, 22 Sep 2023 13:07:20 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     "John B. Wyatt IV" <jwyatt@...hat.com>
Cc:     linux-rt-users@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        kernel-rts-sst <kernel-rts-sst@...hat.com>, jlelli@...hat.com
Subject: Re: Crash with 6.6.0-rc1-rt1 and several i915 locking call traces
 with v6.5.2-rt8 and gnome-shell on Alder Lake laptop

On 2023-09-20 18:07:35 [-0400], John B. Wyatt IV wrote:
> Hello everyone,
Hi,

> While backporting i915 fixes to the RHEL9 kernel for a similar looking
> issue; I noticed the commits that worked for RHEL8 did not work for RHEL9.
> 
> Testing the (almost) latest release: 6.5.2-rt8; showed a lot of call traces
> on RHEL9. [1] being the most common one and it repeats itself on suspend.

A warn-once might help to reduce them so they can be worked on one by
one.

> [2] was the second one to show and seems to be the second most common
> call trace. This was tested on a Framework Alder Lake laptop with i915
> graphics. There was a total of 36 call traces before suspend and
> additional 12 after suspend (once again, [1]).
> 
> When I tested on 6.6.0-rc1-rt1 the kernel crashed on boot. I did not
> have a way to pull the information and was transcribed manually. [3]
> 
> [1]

Both:
> BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
> preempt_count: 0, expected: 0
> RCU nest depth: 6, expected: 0
> 12 locks held by gnome-shell/6590:
…
> BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
> preempt_count: 0, expected: 0
> RCU nest depth: 5, expected: 0

are might-sleep splats. I don't see these on my notebook/desktop on
6.6-rc. I don't remember doing suspend on 6.5 notebook but I did that on
my desktop for testing.
It looks like due to "locks" the RCU is > 0 and then the splat triggers
because it assumes that it will schedule-out which is okay on RT. But
then it is not okay for the ww-mutex to do so I am a little confused if
this is RT only problem or also not RT. But maybe it is just a try-lock
and the warning is just wrongly triggered…

> [3]
> 
> general protection fault, probably for non-canonical address 0xdffffc0004: 0000(#1) PREEMPT_RT SMP KASAN NOPRI
> KASAM: null-ptr-deref in range [0x000...20-0x000...27]
> RIP: 0010:ucsi_debugfs_unregister (drivers/usb/typec/ucsi/debugfs.c:87) 
> [snipped]
> PKRU: 5555554
> Call Trace:
> <TASK>
> usci_destroy+0xe/0x20 
> ucsi_acpi_probe (drivers/usb/typec/ucsi/ucsi_acpi.c:207) 

This is odd. That means that ucsi_register() failed and debugfs was
setup and is NULL. And check in line 87 checks ucsi which is non-NULL
and the ucsi->debugfs is NULL. So it should return but somehow it does
this. Does this also trigger without KASAN?

In the meantime let me try to enable KASAN…

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ