linux-kernel - Re: [BUG almost bisected] Splat in dequeue_rt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmh1q27o2us.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Thu, 29 Aug 2024 15:50:03 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: paulmck@...nel.org
Cc: Chen Yu <yu.c.chen@...el.com>, Peter Zijlstra <peterz@...radead.org>,
 linux-kernel@...r.kernel.org, sfr@...b.auug.org.au,
 linux-next@...r.kernel.org, kernel-team@...a.com
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error

On 29/08/24 03:28, Paul E. McKenney wrote:
> On Wed, Aug 28, 2024 at 11:39:19AM -0700, Paul E. McKenney wrote:
>>
>> The 500*TREE03 run had exactly one failure that was the dreaded
>> enqueue_dl_entity() failure, followed by RCU CPU stall warnings.
>>
>> But a huge improvement over the prior state!
>>
>> Plus, this failure is likely unrelated (see earlier discussions with
>> Peter).  I just started a 5000*TREE03 run, just in case we can now
>> reproduce this thing.
>
> And we can now reproduce it!  Again, this might an unrelated bug that
> was previously a one-off (OK, OK, a two-off!).  Or this series might
> have made it more probably.  Who knows?
>
> Eight of those 5000 runs got us this splat in enqueue_dl_entity():
>
>       WARN_ON_ONCE(on_dl_rq(dl_se));
>
> Immediately followed by this splat in __enqueue_dl_entity():
>
>       WARN_ON_ONCE(!RB_EMPTY_NODE(&dl_se->rb_node));
>
> These two splats always happened during rcutorture's testing of
> RCU priority boosting.  This testing involves spawning a CPU-bound
> low-priority real-time kthread for each CPU, which is intended to starve
> the non-realtime RCU readers, which are in turn to be rescued by RCU
> priority boosting.
>

Thanks!

> I do not entirely trust the following rcutorture diagnostic, but just
> in case it helps...
>
> Many of them also printed something like this as well:
>
> [  111.279575] Boost inversion persisted: No QS from CPU 3
>
> This message means that rcutorture has decided that RCU priority boosting
> has failed, but not because a low-priority preempted task was blocking
> the grace period, but rather because some CPU managed to be running
> the same task in-kernel the whole time without doing a context switch.
> In some cases (but not this one), this was simply a side-effect of
> RCU's grace-period kthread being starved of CPU time.  Such starvation
> is a surprise in this case because this kthread is running at higher
> real-time priority than the kthreads that are intended to force RCU
> priority boosting to happen.
>
> Again, I do not entirely trust this rcutorture diagnostic, just in case
> it helps.
>
>                                                       Thanx, Paul
>
> ------------------------------------------------------------------------
>
> [  287.536845] rcu-torture: rcu_torture_boost is stopping
> [  287.536867] ------------[ cut here ]------------
> [  287.540661] WARNING: CPU: 4 PID: 132 at kernel/sched/deadline.c:2003 enqueue_dl_entity+0x50d/0x5c0
> [  287.542299] Modules linked in:
> [  287.542868] CPU: 4 UID: 0 PID: 132 Comm: kcompactd0 Not tainted 6.11.0-rc1-00051-gb32d207e39de #1701
> [  287.544335] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [  287.546337] RIP: 0010:enqueue_dl_entity+0x50d/0x5c0
> [  287.603245]  ? __warn+0x7e/0x120
> [  287.603752]  ? enqueue_dl_entity+0x54b/0x5c0
> [  287.604405]  ? report_bug+0x18e/0x1a0
> [  287.604978]  ? handle_bug+0x3d/0x70
> [  287.605523]  ? exc_invalid_op+0x18/0x70
> [  287.606116]  ? asm_exc_invalid_op+0x1a/0x20
> [  287.606765]  ? enqueue_dl_entity+0x54b/0x5c0
> [  287.607420]  dl_server_start+0x31/0xe0
> [  287.608013]  enqueue_task_fair+0x218/0x680
> [  287.608643]  activate_task+0x21/0x50
> [  287.609197]  attach_task+0x30/0x50
> [  287.609736]  sched_balance_rq+0x65d/0xe20
> [  287.610351]  sched_balance_newidle.constprop.0+0x1a0/0x360
> [  287.611205]  pick_next_task_fair+0x2a/0x2e0
> [  287.611849]  __schedule+0x106/0x8b0


Assuming this is still related to switched_from_fair(), since this is hit
during priority boosting then it would mean rt_mutex_setprio() gets
involved, but that uses the same set of DQ/EQ flags as
__sched_setscheduler().

I don't see any obvious path in

dequeue_task_fair()
`\
  dequeue_entities()

that would prevent dl_server_stop() from happening when doing the
class-switch dequeue_task()... I don't see it in the TREE03 config, but can
you confirm CONFIG_CFS_BANDWIDTH isn't set in that scenario?

I'm going to keep digging but I'm not entirely sure yet whether this is
related to the switched_from_fair() hackery or not, I'll send the patch I
have as-is and continue digging for a bit.