linux-kernel - Re: Crash with 6.6.0-rc1-rt1 and several i915 locking call traces with v6.5.2-rt8 and gnome-shell on Alder Lake laptop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <F4pA9qG-zsUd2s-KZZ8oHrbd@linutronix.de>
Date:   Mon, 2 Oct 2023 11:45:45 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     "John B. Wyatt IV" <jwyatt@...hat.com>
Cc:     linux-rt-users@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        kernel-rts-sst <kernel-rts-sst@...hat.com>, jlelli@...hat.com
Subject: Re: Crash with 6.6.0-rc1-rt1 and several i915 locking call traces
 with v6.5.2-rt8 and gnome-shell on Alder Lake laptop

On 2023-09-29 04:43:32 [-0400], John B. Wyatt IV wrote:
> For stock (non-rt) I do not see it with 6.6-rc2. This was compiled
> with the Stream 9 debug config.
> 
> I was able to reproduce similar call traces once I tested again
> with 6.6-rc3-rt5 at [4] and [5].
> 
> What would be the best way to determine if the warning is wrongly
> triggered?

I looked at the traces in this email and they originate from a
might_sleep() in guc_context_set_prio(). The reason is that they check
at the atomic/interrupt state to figure out if they can sleep or not.
Both checks don't work on RT as intended and the former has a not to not
be used in drivers…

The snippet below should cure this. Could you test, please.

Sebastian


diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 8dc291ff00935..5b8d084c9c58c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -317,7 +317,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc *guc,
 {
 	int err;
 	unsigned int sleep_period_ms = 1;
-	bool not_atomic = !in_atomic() && !irqs_disabled();
+	bool not_atomic = !in_atomic() && !irqs_disabled() && !rcu_preempt_depth();
 
 	/*
 	 * FIXME: Have caller pass in if we are in an atomic context to avoid