linux-kernel - Re: [BUG almost bisected] Splat in dequeue_rt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <250cde11-650f-4689-9c36-816406f1b9b8@paulmck-laptop>
Date: Sun, 8 Sep 2024 09:32:18 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Valentin Schneider <vschneid@...hat.com>
Cc: Chen Yu <yu.c.chen@...el.com>, Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, sfr@...b.auug.org.au,
	linux-next@...r.kernel.org, kernel-team@...a.com
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error

On Thu, Aug 29, 2024 at 07:13:07AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 29, 2024 at 03:50:03PM +0200, Valentin Schneider wrote:
> > On 29/08/24 03:28, Paul E. McKenney wrote:
> > > On Wed, Aug 28, 2024 at 11:39:19AM -0700, Paul E. McKenney wrote:
> > >>
> > >> The 500*TREE03 run had exactly one failure that was the dreaded
> > >> enqueue_dl_entity() failure, followed by RCU CPU stall warnings.
> > >>
> > >> But a huge improvement over the prior state!
> > >>
> > >> Plus, this failure is likely unrelated (see earlier discussions with
> > >> Peter).  I just started a 5000*TREE03 run, just in case we can now
> > >> reproduce this thing.
> > >
> > > And we can now reproduce it!  Again, this might an unrelated bug that
> > > was previously a one-off (OK, OK, a two-off!).  Or this series might
> > > have made it more probably.  Who knows?
> > >
> > > Eight of those 5000 runs got us this splat in enqueue_dl_entity():
> > >
> > >       WARN_ON_ONCE(on_dl_rq(dl_se));
> > >
> > > Immediately followed by this splat in __enqueue_dl_entity():
> > >
> > >       WARN_ON_ONCE(!RB_EMPTY_NODE(&dl_se->rb_node));
> > >
> > > These two splats always happened during rcutorture's testing of
> > > RCU priority boosting.  This testing involves spawning a CPU-bound
> > > low-priority real-time kthread for each CPU, which is intended to starve
> > > the non-realtime RCU readers, which are in turn to be rescued by RCU
> > > priority boosting.
> > >
> > 
> > Thanks!
> > 
> > > I do not entirely trust the following rcutorture diagnostic, but just
> > > in case it helps...
> > >
> > > Many of them also printed something like this as well:
> > >
> > > [  111.279575] Boost inversion persisted: No QS from CPU 3
> > >
> > > This message means that rcutorture has decided that RCU priority boosting
> > > has failed, but not because a low-priority preempted task was blocking
> > > the grace period, but rather because some CPU managed to be running
> > > the same task in-kernel the whole time without doing a context switch.
> > > In some cases (but not this one), this was simply a side-effect of
> > > RCU's grace-period kthread being starved of CPU time.  Such starvation
> > > is a surprise in this case because this kthread is running at higher
> > > real-time priority than the kthreads that are intended to force RCU
> > > priority boosting to happen.
> > >
> > > Again, I do not entirely trust this rcutorture diagnostic, just in case
> > > it helps.
> > >
> > >                                                       Thanx, Paul
> > >
> > > ------------------------------------------------------------------------
> > >
> > > [  287.536845] rcu-torture: rcu_torture_boost is stopping
> > > [  287.536867] ------------[ cut here ]------------
> > > [  287.540661] WARNING: CPU: 4 PID: 132 at kernel/sched/deadline.c:2003 enqueue_dl_entity+0x50d/0x5c0
> > > [  287.542299] Modules linked in:
> > > [  287.542868] CPU: 4 UID: 0 PID: 132 Comm: kcompactd0 Not tainted 6.11.0-rc1-00051-gb32d207e39de #1701
> > > [  287.544335] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> > > [  287.546337] RIP: 0010:enqueue_dl_entity+0x50d/0x5c0
> > > [  287.603245]  ? __warn+0x7e/0x120
> > > [  287.603752]  ? enqueue_dl_entity+0x54b/0x5c0
> > > [  287.604405]  ? report_bug+0x18e/0x1a0
> > > [  287.604978]  ? handle_bug+0x3d/0x70
> > > [  287.605523]  ? exc_invalid_op+0x18/0x70
> > > [  287.606116]  ? asm_exc_invalid_op+0x1a/0x20
> > > [  287.606765]  ? enqueue_dl_entity+0x54b/0x5c0
> > > [  287.607420]  dl_server_start+0x31/0xe0
> > > [  287.608013]  enqueue_task_fair+0x218/0x680
> > > [  287.608643]  activate_task+0x21/0x50
> > > [  287.609197]  attach_task+0x30/0x50
> > > [  287.609736]  sched_balance_rq+0x65d/0xe20
> > > [  287.610351]  sched_balance_newidle.constprop.0+0x1a0/0x360
> > > [  287.611205]  pick_next_task_fair+0x2a/0x2e0
> > > [  287.611849]  __schedule+0x106/0x8b0
> > 
> > 
> > Assuming this is still related to switched_from_fair(), since this is hit
> > during priority boosting then it would mean rt_mutex_setprio() gets
> > involved, but that uses the same set of DQ/EQ flags as
> > __sched_setscheduler().
> > 
> > I don't see any obvious path in
> > 
> > dequeue_task_fair()
> > `\
> >   dequeue_entities()
> > 
> > that would prevent dl_server_stop() from happening when doing the
> > class-switch dequeue_task()... I don't see it in the TREE03 config, but can
> > you confirm CONFIG_CFS_BANDWIDTH isn't set in that scenario?
> > 
> > I'm going to keep digging but I'm not entirely sure yet whether this is
> > related to the switched_from_fair() hackery or not, I'll send the patch I
> > have as-is and continue digging for a bit.
> 
> Makes sense to me, thank you, and glad that the diagnostics helped.
> 
> Looking forward to further fixes.  ;-)

Just following up...

For whatever it is worth, on last night's run of next-20240906, I got
nine failures out of 100 6-hour runs of rcutorture’s TREE03 scenario.
These failures were often, but not always, shortly followed by a hard hang.

The warning at line 1995 is the WARN_ON_ONCE(on_dl_rq(dl_se))
in enqueue_dl_entity() and the warning at line 1971 is the
WARN_ON_ONCE(!RB_EMPTY_NODE(&dl_se->rb_node)) in __enqueue_dl_entity().

The pair of splats is shown below, in case it helps.

							Thanx, Paul

------------------------------------------------------------------------

[21122.992435] ------------[ cut here ]------------
[21122.994090] WARNING: CPU: 13 PID: 8032 at kernel/sched/deadline.c:1995 enqueue_dl_entity+0x511/0x5d0
[21122.995554] Modules linked in:
[21122.996048] CPU: 13 UID: 0 PID: 8032 Comm: kworker/13:1 Not tainted 6.11.0-rc6-next-20240906-dirty #2006
[21122.997548] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[21122.999310] Workqueue:  0x0 (mm_percpu_wq)
[21122.999981] RIP: 0010:enqueue_dl_entity+0x511/0x5d0
[21123.000757] Code: ff 48 89 ef e8 b0 d2 ff ff 0f b6 4d 54 e9 0e fc ff ff 85 db
 0f 84 d0 fe ff ff 5b 44 89 e6 48 89 ef 5d 41 5c e9 00 df ff ff 90 <0f> 0b 90 e9 fa fa ff ff 48 8b bb f8 09 00 00 48 39 fe 0f 89 de fb
[21123.003697] RSP: 0000:ffffabca84373bf0 EFLAGS: 00010086
[21123.004537] RAX: 0000000000000001 RBX: ffff98381f56cae8 RCX: 0000000000000002
[21123.005660] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff98381f56cae8
[21123.006781] RBP: ffff98381f56cae8 R08: 0000000000000001 R09: 0000000000000161
[21123.007902] R10: 0000000000000000 R11: ffff983802399d90 R12: 0000000000000001
[21123.009026] R13: 00000000002dc6c0 R14: ffff98381f56c240 R15: ffff98381f56c280
[21123.010213] FS:  0000000000000000(0000) GS:ffff98381f540000(0000) knlGS:0000000000000000
[21123.011584] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21123.012498] CR2: 0000000000000000 CR3: 0000000002c16000 CR4: 00000000000006f0
[21123.013647] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21123.014780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[21123.015915] Call Trace:
[21123.016326]  <TASK>
[21123.016672]  ? __warn+0x83/0x130
[21123.017198]  ? enqueue_dl_entity+0x511/0x5d0
[21123.017877]  ? report_bug+0x18e/0x1a0
[21123.018469]  ? handle_bug+0x54/0x90
[21123.019027]  ? exc_invalid_op+0x18/0x70
[21123.019647]  ? asm_exc_invalid_op+0x1a/0x20
[21123.020318]  ? enqueue_dl_entity+0x511/0x5d0
[21123.020997]  dl_server_start+0x31/0xe0
[21123.021603]  enqueue_task_fair+0x218/0x680
[21123.022264]  activate_task+0x21/0x50
[21123.022837]  attach_task+0x30/0x50
[21123.023389]  sched_balance_rq+0x65e/0xe00
[21123.024031]  sched_balance_newidle.constprop.0+0x190/0x360
[21123.024903]  pick_next_task_fair+0x2a/0x340
[21123.025576]  __schedule+0x10e/0x8b0
[21123.026135]  ? queue_delayed_work_on+0x53/0x60
[21123.026849]  schedule+0x22/0xd0
[21123.027366]  worker_thread+0x1a2/0x3a0
[21123.027963]  ? __pfx_worker_thread+0x10/0x10
[21123.028651]  kthread+0xd1/0x100
[21123.029153]  ? __pfx_kthread+0x10/0x10
[21123.029758]  ret_from_fork+0x2f/0x50
[21123.030334]  ? __pfx_kthread+0x10/0x10
[21123.030933]  ret_from_fork_asm+0x1a/0x30
[21123.031566]  </TASK>
[21123.031920] ---[ end trace 0000000000000000 ]---
[21123.032669] ------------[ cut here ]------------
[21123.033409] WARNING: CPU: 13 PID: 8032 at kernel/sched/deadline.c:1971 enqueue_dl_entity+0x54f/0x5d0
[21123.034853] Modules linked in:
[21123.035354] CPU: 13 UID: 0 PID: 8032 Comm: kworker/13:1 Tainted: G        W          6.11.0-rc6-next-20240906-dirty #2006
[21123.037081] Tainted: [W]=WARN
[21123.037562] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[21123.039331] Workqueue:  0x0 (mm_percpu_wq)
[21123.039984] RIP: 0010:enqueue_dl_entity+0x54f/0x5d0
[21123.040767] Code: de fb ff ff e9 66 ff ff ff 89 c1 45 84 d2 0f 84 ce fb ff ff a8 20 0f 84 c6 fb ff ff 84 c0 0f 89 20 fe ff ff e9 b9 fb ff ff 90 <0f> 0b 90 e9 f4 fb ff ff 84 d2 0f 85 e3 fa ff ff 48 89 ea 48 8d b5
[21123.043716] RSP: 0000:ffffabca84373bf0 EFLAGS: 00010086
[21123.044549] RAX: 00000000ffffff00 RBX: ffff98381f56c240 RCX: 0000000000000000
[21123.045676] RDX: 0000000000000001 RSI: 0000000b1a2986b8 RDI: 0000000b1a2154a4
[21123.046806] RBP: ffff98381f56cae8 R08: ffff98381f56ca80 R09: 000000003b9aca00
[21123.047931] R10: 0000000000000001 R11: 00000000000ee6b2 R12: 0000000000000001
[21123.049061] R13: 00000000002dc6c0 R14: ffff98381f56c240 R15: ffff98381f56c280
[21123.050469] FS:  0000000000000000(0000) GS:ffff98381f540000(0000) knlGS:0000000000000000
[21123.051761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21123.052679] CR2: 0000000000000000 CR3: 0000000002c16000 CR4: 00000000000006f0
[21123.053817] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21123.054952] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[21123.056086] Call Trace:
[21123.056691]  <TASK>
[21123.057205]  ? __warn+0x83/0x130
[21123.057984]  ? enqueue_dl_entity+0x54f/0x5d0
[21123.058989]  ? report_bug+0x18e/0x1a0
[21123.059865]  ? handle_bug+0x54/0x90
[21123.060689]  ? exc_invalid_op+0x18/0x70
[21123.061591]  ? asm_exc_invalid_op+0x1a/0x20
[21123.062584]  ? enqueue_dl_entity+0x54f/0x5d0
[21123.063337]  dl_server_start+0x31/0xe0
[21123.063939]  enqueue_task_fair+0x218/0x680
[21123.064604]  activate_task+0x21/0x50
[21123.065185]  attach_task+0x30/0x50
[21123.065729]  sched_balance_rq+0x65e/0xe00
[21123.066377]  sched_balance_newidle.constprop.0+0x190/0x360
[21123.067255]  pick_next_task_fair+0x2a/0x340
[21123.067921]  __schedule+0x10e/0x8b0
[21123.068524]  ? queue_delayed_work_on+0x53/0x60
[21123.069238]  schedule+0x22/0xd0
[21123.069742]  worker_thread+0x1a2/0x3a0
[21123.070347]  ? __pfx_worker_thread+0x10/0x10
[21123.071025]  kthread+0xd1/0x100
[21123.071536]  ? __pfx_kthread+0x10/0x10
[21123.072133]  ret_from_fork+0x2f/0x50
[21123.072714]  ? __pfx_kthread+0x10/0x10
[21123.073323]  ret_from_fork_asm+0x1a/0x30
[21123.073949]  </TASK>
[21123.074314] ---[ end trace 0000000000000000 ]---