lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <103b1710-39ca-40d0-947d-fdac32d6e6a0@paulmck-laptop>
Date: Tue, 27 Aug 2024 11:35:02 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Valentin Schneider <vschneid@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
	sfr@...b.auug.org.au, linux-next@...r.kernel.org,
	kernel-team@...a.com
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error

On Tue, Aug 27, 2024 at 10:33:13AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 27, 2024 at 05:41:52PM +0200, Valentin Schneider wrote:
> > On 27/08/24 12:03, Valentin Schneider wrote:
> > > On 26/08/24 09:31, Paul E. McKenney wrote:
> > >> On Mon, Aug 26, 2024 at 01:44:35PM +0200, Valentin Schneider wrote:
> > >>>
> > >>> Woops...
> > >>
> > >> On the other hand, removing that dequeue_task() makes next-20240823
> > >> pass light testing.
> > >>
> > >> I have to ask...
> > >>
> > >> Does it make sense for Valentin to rearrange those commits to fix
> > >> the two build bugs and remove that dequeue_task(), all in the name of
> > >> bisectability.  Or is there something subtle here so that only Peter
> > >> can do this work, shoulder and all?
> > >>
> > >
> > > I suppose at the very least another pair of eyes on this can't hurt, let me
> > > get untangled from some other things first and I'll take a jab at it.
> > 
> > I've taken tip/sched/core and shuffled hunks around; I didn't re-order any
> > commit. I've also taken out the dequeue from switched_from_fair() and put
> > it at the very top of the branch which should hopefully help bisection.
> > 
> > The final delta between that branch and tip/sched/core is empty, so it
> > really is just shuffling inbetween commits.
> > 
> > Please find the branch at:
> > 
> > https://gitlab.com/vschneid/linux.git -b mainline/sched/eevdf-complete-builderr
> > 
> > I'll go stare at the BUG itself now.
> 
> Thank you!
> 
> I have fired up tests on the "BROKEN?" commit.  If that fails, I will
> try its predecessor, and if that fails, I wlll bisect from e28b5f8bda01
> ("sched/fair: Assert {set_next,put_prev}_entity() are properly balanced"),
> which has stood up to heavy hammering in earlier testing.

And of 50 runs of TREE03 on the "BROKEN?" commit resulted in 32 failures.
Of these, 29 were the dequeue_rt_stack() failure.  Two more were RCU
CPU stall warnings, and the last one was an oddball "kernel BUG at
kernel/sched/rt.c:1714" followed by an equally oddball "Oops: invalid
opcode: 0000 [#1] PREEMPT SMP PTI".

Just to be specific, this is commit:

df8fe34bfa36 ("BROKEN? sched/fair: Dequeue sched_delayed tasks when switching from fair")

This commit's predecessor is this commit:

2f888533d073 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")

This predecessor commit passes 50 runs of TREE03 with no failures.

So that addition of that dequeue_task() call to the switched_from_fair()
function is looking quite suspicious to me.  ;-)

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ