lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a19308ed-7252-4119-b891-2a61791bb6e5@paulmck-laptop>
Date: Tue, 27 Aug 2024 13:36:57 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Valentin Schneider <vschneid@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
	sfr@...b.auug.org.au, linux-next@...r.kernel.org,
	kernel-team@...a.com
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error

On Tue, Aug 27, 2024 at 10:30:24PM +0200, Valentin Schneider wrote:
> On 27/08/24 11:35, Paul E. McKenney wrote:
> > On Tue, Aug 27, 2024 at 10:33:13AM -0700, Paul E. McKenney wrote:
> >> On Tue, Aug 27, 2024 at 05:41:52PM +0200, Valentin Schneider wrote:
> >> > I've taken tip/sched/core and shuffled hunks around; I didn't re-order any
> >> > commit. I've also taken out the dequeue from switched_from_fair() and put
> >> > it at the very top of the branch which should hopefully help bisection.
> >> >
> >> > The final delta between that branch and tip/sched/core is empty, so it
> >> > really is just shuffling inbetween commits.
> >> >
> >> > Please find the branch at:
> >> >
> >> > https://gitlab.com/vschneid/linux.git -b mainline/sched/eevdf-complete-builderr
> >> >
> >> > I'll go stare at the BUG itself now.
> >>
> >> Thank you!
> >>
> >> I have fired up tests on the "BROKEN?" commit.  If that fails, I will
> >> try its predecessor, and if that fails, I wlll bisect from e28b5f8bda01
> >> ("sched/fair: Assert {set_next,put_prev}_entity() are properly balanced"),
> >> which has stood up to heavy hammering in earlier testing.
> >
> > And of 50 runs of TREE03 on the "BROKEN?" commit resulted in 32 failures.
> > Of these, 29 were the dequeue_rt_stack() failure.  Two more were RCU
> > CPU stall warnings, and the last one was an oddball "kernel BUG at
> > kernel/sched/rt.c:1714" followed by an equally oddball "Oops: invalid
> > opcode: 0000 [#1] PREEMPT SMP PTI".
> >
> > Just to be specific, this is commit:
> >
> > df8fe34bfa36 ("BROKEN? sched/fair: Dequeue sched_delayed tasks when switching from fair")
> >
> > This commit's predecessor is this commit:
> >
> > 2f888533d073 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
> >
> > This predecessor commit passes 50 runs of TREE03 with no failures.
> >
> > So that addition of that dequeue_task() call to the switched_from_fair()
> > function is looking quite suspicious to me.  ;-)
> >
> >                                                       Thanx, Paul
> 
> Thanks for the testing!
> 
> The WARN_ON_ONCE(!rt_se->on_list); hit in __dequeue_rt_entity() feels like
> a put_prev/set_next kind of issue...
> 
> So far I'd assumed a ->sched_delayed task can't be current during
> switched_from_fair(), I got confused because it's Mond^CCC Tuesday, but I
> think that still holds: we can't get a balance_dl() or balance_rt() to drop
> the RQ lock because prev would be fair, and we can't get a
> newidle_balance() with a ->sched_delayed task because we'd have
> sched_fair_runnable() := true.
> 
> I'll pick this back up tomorrow, this is a task that requires either
> caffeine or booze and it's too late for either.

Thank you for chasing this, and get some sleep!  This one is of course
annoying, but it is not (yet) an emergency.  I look forward to seeing
what you come up with.

Also, I would of course be happy to apply debug patches.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ