[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmhcyltogin.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Tue, 27 Aug 2024 22:30:24 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: paulmck@...nel.org
Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
sfr@...b.auug.org.au, linux-next@...r.kernel.org, kernel-team@...a.com
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error
On 27/08/24 11:35, Paul E. McKenney wrote:
> On Tue, Aug 27, 2024 at 10:33:13AM -0700, Paul E. McKenney wrote:
>> On Tue, Aug 27, 2024 at 05:41:52PM +0200, Valentin Schneider wrote:
>> > I've taken tip/sched/core and shuffled hunks around; I didn't re-order any
>> > commit. I've also taken out the dequeue from switched_from_fair() and put
>> > it at the very top of the branch which should hopefully help bisection.
>> >
>> > The final delta between that branch and tip/sched/core is empty, so it
>> > really is just shuffling inbetween commits.
>> >
>> > Please find the branch at:
>> >
>> > https://gitlab.com/vschneid/linux.git -b mainline/sched/eevdf-complete-builderr
>> >
>> > I'll go stare at the BUG itself now.
>>
>> Thank you!
>>
>> I have fired up tests on the "BROKEN?" commit. If that fails, I will
>> try its predecessor, and if that fails, I wlll bisect from e28b5f8bda01
>> ("sched/fair: Assert {set_next,put_prev}_entity() are properly balanced"),
>> which has stood up to heavy hammering in earlier testing.
>
> And of 50 runs of TREE03 on the "BROKEN?" commit resulted in 32 failures.
> Of these, 29 were the dequeue_rt_stack() failure. Two more were RCU
> CPU stall warnings, and the last one was an oddball "kernel BUG at
> kernel/sched/rt.c:1714" followed by an equally oddball "Oops: invalid
> opcode: 0000 [#1] PREEMPT SMP PTI".
>
> Just to be specific, this is commit:
>
> df8fe34bfa36 ("BROKEN? sched/fair: Dequeue sched_delayed tasks when switching from fair")
>
> This commit's predecessor is this commit:
>
> 2f888533d073 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
>
> This predecessor commit passes 50 runs of TREE03 with no failures.
>
> So that addition of that dequeue_task() call to the switched_from_fair()
> function is looking quite suspicious to me. ;-)
>
> Thanx, Paul
Thanks for the testing!
The WARN_ON_ONCE(!rt_se->on_list); hit in __dequeue_rt_entity() feels like
a put_prev/set_next kind of issue...
So far I'd assumed a ->sched_delayed task can't be current during
switched_from_fair(), I got confused because it's Mond^CCC Tuesday, but I
think that still holds: we can't get a balance_dl() or balance_rt() to drop
the RQ lock because prev would be fair, and we can't get a
newidle_balance() with a ->sched_delayed task because we'd have
sched_fair_runnable() := true.
I'll pick this back up tomorrow, this is a task that requires either
caffeine or booze and it's too late for either.
Powered by blists - more mailing lists