lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241203115717.62392-1-ryotkkr98@gmail.com>
Date: Tue,  3 Dec 2024 20:57:17 +0900
From: Ryo Takakura <ryotkkr98@...il.com>
To: boqun.feng@...il.com,
	peterz@...radead.org
Cc: bigeasy@...utronix.de,
	clrkwllms@...nel.org,
	linux-kernel@...r.kernel.org,
	linux-rt-devel@...ts.linux.dev,
	longman@...hat.com,
	mingo@...hat.com,
	rostedt@...dmis.org,
	ryotkkr98@...il.com,
	tglx@...utronix.de,
	will@...nel.org
Subject: Re: [PATCH] lockdep: Fix wait context check on softirq for PREEMPT_RT

Hi Peter and Boqun,
Thanks for getting back!

On Mon, 2 Dec 2024 23:49:24 -0800, Boqun Feng wrote:
>On Mon, Dec 02, 2024 at 11:32:28AM +0100, Peter Zijlstra wrote:
>> On Mon, Dec 02, 2024 at 10:20:17AM +0900, Ryo Takakura wrote:
>> > Commit 0c1d7a2c2d32 ("lockdep: Remove softirq accounting on
>> > PREEMPT_RT.") stopped updating @softirq_context on PREEMPT_RT
>> > to ignore "inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage"
>> > as the report accounts softirq context which PREEMPT_RT doesn't
>> > have to.
>> > 
>> > However, wait context check still needs to report mutex usage
>> > within softirq, even when its threaded on PREEMPT_RT. The check
>> > is failing to report the usage as task_wait_context() checks if
>> > its in softirq by referencing @softirq_context, ending up not 
>> > assigning the correct wait type of LD_WAIT_CONFIG for PREEMPT_RT's
>> > softirq.
>> > 
>> > [    0.184549]   | wait context tests |
>> > [    0.184549]   --------------------------------------------------------------------------
>> > [    0.184549]                                  | rcu  | raw  | spin |mutex |
>> > [    0.184549]   --------------------------------------------------------------------------
>> > [    0.184550]                in hardirq context:  ok  |  ok  |  ok  |  ok  |
>> > [    0.185083] in hardirq context (not threaded):  ok  |  ok  |  ok  |  ok  |
>> > [    0.185606]                in softirq context:  ok  |  ok  |  ok  |FAILED|
>> > 
>> > Account softirq context but only when !PREEMPT_RT so that
>> > task_wait_context() returns LD_WAIT_CONFIG as intended.
>> > 
>> > Signed-off-by: Ryo Takakura <ryotkkr98@...il.com>
>> > 
>> > 
>> > ---
>> > 
>> > Hi! 
>> > 
>> > I wasn't able come up with a way to fix the wait context test while 
>> > keeping the commit 0c1d7a2c2d32 ("lockdep: Remove softirq accounting 
>> > on PREEMPT_RT.") without referencing @softirq_context...
>> > Hoping to get a feedback on it!
>> > 
>> > Also I wonder if the test can be skipped as I believe its taken care 
>
>Skipping the test would be awful because tests are supposed to catch
>unexpected bugs :/
>
>> > by spinlock wait context test since the PREEMPT_RT's softirq context is 
>> > protected by local_lock which is mapped to rt_spinlock.
>> 
>> Right,.. so I remember talking about this with Boqun, and I think we
>> were going to 'fix' the test, but I can't quite remember.
>> 
>> Perhaps adding the local_lock to SOFTIRQ_ENTER?
>
>So I took a look, SOFTIRQ_ENTER() already calls local_bh_disable(),
>which is supposed to acquire a local_lock "softirq_ctrl.lock" (Ryo, I
>believe this is the local_lock you mentioned above?) in normal cases.

Yes, and I was assuming the normal case...

Since Peter's feedback, I was just wondering why the wait context 
selftest was not reporting anything if the local_lock were already 
acquired (answered below!).

>However, if local_bh_disable() is called with preempt disabled, then no
>local_lock will be acquired. For example, if you do:
>
>	preempt_disable();
>	local_bh_disable();
>	preempt_enable();
>	mutex_lock();
>
>no local_lock will be acquired, therefore check_wait_context() will
>report nothing. The fun part of "why this caused an issue in the lockdep
>selftests?" is these tests are run with preempt_count() == 1 ;-) I guess
>this is because we run these in early stage of kernel booting? Will take
>a look tomorrow.

I see! That is indeed quite fun!

>Maybe the right way to fix this is adding a conceptual local_lock for
>BH disable like below.
>
>Regards,
>Boqun
>
>------------------------->8
>diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h
>index fc53e0ad56d9..d5b898588277 100644
>--- a/include/linux/bottom_half.h
>+++ b/include/linux/bottom_half.h
>@@ -4,6 +4,7 @@
> 
> #include <linux/instruction_pointer.h>
> #include <linux/preempt.h>
>+#include <linux/lockdep.h>
> 
> #if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_TRACE_IRQFLAGS)
> extern void __local_bh_disable_ip(unsigned long ip, unsigned int cnt);
>@@ -15,9 +16,12 @@ static __always_inline void __local_bh_disable_ip(unsigned long ip, unsigned int
> }
> #endif
> 
>+extern struct lockdep_map bh_lock_map;
>+
> static inline void local_bh_disable(void)
> {
> 	__local_bh_disable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
>+	lock_map_acquire(&bh_lock_map);
> }
> 
> extern void _local_bh_enable(void);
>@@ -25,6 +29,7 @@ extern void __local_bh_enable_ip(unsigned long ip, unsigned int cnt);
> 
> static inline void local_bh_enable_ip(unsigned long ip)
> {
>+	lock_map_release(&bh_lock_map);
> 	__local_bh_enable_ip(ip, SOFTIRQ_DISABLE_OFFSET);
> }
> 
>diff --git a/kernel/softirq.c b/kernel/softirq.c
>index 8b41bd13cc3d..17d9bf6e0caf 100644
>--- a/kernel/softirq.c
>+++ b/kernel/softirq.c
>@@ -1066,3 +1066,13 @@ unsigned int __weak arch_dynirq_lower_bound(unsigned int from)
> {
> 	return from;
> }
>+
>+static struct lock_class_key bh_lock_key;
>+struct lockdep_map bh_lock_map = {
>+	.name = "local_bh",
>+	.key = &bh_lock_key,
>+	.wait_type_outer = LD_WAIT_FREE,
>+	.wait_type_inner = LD_WAIT_CONFIG, /* PREEMPT_RT makes BH preemptible. */
>+	.lock_type = LD_LOCK_PERCPU,
>+};
>+EXPORT_SYMBOL_GPL(bh_lock_map);

Let me take a look at it!

Sincerely,
Ryo Takakura

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ