[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b854f67c-93c5-41b8-900e-69c78e0ecab7@paulmck-laptop>
Date: Tue, 17 Dec 2024 08:42:04 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Tomas Glozar <tglozar@...hat.com>
Cc: Valentin Schneider <vschneid@...hat.com>, Chen Yu <yu.c.chen@...el.com>,
Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
sfr@...b.auug.org.au, linux-next@...r.kernel.org,
kernel-team@...a.com
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error
On Mon, Dec 16, 2024 at 11:36:25AM -0800, Paul E. McKenney wrote:
> On Mon, Dec 16, 2024 at 03:38:20PM +0100, Tomas Glozar wrote:
> > ne 15. 12. 2024 v 19:41 odesÃlatel Paul E. McKenney <paulmck@...nel.org> napsal:
> > >
> > > And the fix for the TREE03 too-short grace periods is finally in, at
> > > least in prototype form:
> > >
> > > https://lore.kernel.org/all/da5065c4-79ba-431f-9d7e-1ca314394443@paulmck-laptop/
> > >
> > > Or this commit on -rcu:
> > >
> > > 22bee20913a1 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
> > >
> > > This passes more than 30 hours of 400 concurrent instances of rcutorture's
> > > TREE03 scenario, with modifications that brought the bug reproduction
> > > rate up to 50 per hour. I therefore have strong reason to believe that
> > > this fix is a real fix.
> > >
> > > With this fix in place, a 20-hour run of 400 concurrent instances
> > > of rcutorture's TREE03 scenario resulted in 50 instances of the
> > > enqueue_dl_entity() splat pair. One (untrimmed) instance of this pair
> > > of splats is shown below.
> > >
> > > You guys did reproduce this some time back, so unless you tell me
> > > otherwise, I will assume that you have this in hand. I would of course
> > > be quite happy to help, especially with adding carefully chosen debug
> > > (heisenbug and all that) or testing of alleged fixes.
> > >
> >
> > The same splat was recently reported to LKML [1] and a patchset was
> > sent and merged into tip/sched/urgent that fixes a few bugs around
> > double-enqueue of the deadline server [2]. I'm currently re-running
> > TREE03 with those patches, hoping they will also fix this issue.
>
> Thank you very much!
>
> An initial four-hour test of 400 instances of an enhanced TREE03 ran
> error-free. I would have expected about 10 errors, so this gives me
> 99.9+% confidence that the patches improved things at least a little
> bit and 99% confidence that these patches reduced the error rate by at
> least a factor of two.
>
> I am starting an overnight run. If that completes without error, this
> will provide 99% confidence that these patches reduced the error rate
> by at least an order of magnitude.
And we have that level of confidence!
Tested-by: Paul E. McKenney <paulmck@...nel.org>
Powered by blists - more mailing lists