[<prev] [next>] [day] [month] [year] [list]
Message-ID: <87254ef1-fa58-4747-b2e1-5c85ecde15bf@windriver.com>
Date: Thu, 4 Sep 2025 10:33:20 -0600
From: Chris Friesen <chris.friesen@...driver.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: osandov@...com, Peter Zijlstra <peterz@...radead.org>
Subject: sched: observed instability under stress in 6.12 and mainline
Hi,
I'd like to draw the attention of the scheduler maintainers to a number
of kernel bugzilla reports submitted by a colleague a couple of weeks ago:
6.12.18:
https://bugzilla.kernel.org/show_bug.cgi?id=220447
https://bugzilla.kernel.org/show_bug.cgi?id=220448
v6.16-rt3
https://bugzilla.kernel.org/show_bug.cgi?id=220450
https://bugzilla.kernel.org/show_bug.cgi?id=220449
There seems to be something wrong with either the logic or the locking.
In one case this resulted in a NULL pointer dereference in
pick_next_entity(). In another case it resulted in
BUG_ON(!rq->nr_running) in dequeue_top_rt_rq() and
SCHED_WARN_ON(!se->on_rq) in update_entity_lag().
My colleague suggests that the NULL pointer dereference may be due to
pick_eevdf() returning NULL in pick_next_entity().
I did some digging and found that
https://gitlab.com/linux-kernel/stable/-/commit/86b37810 would not have
been included in 6.12.18, but the equivalent fix should have been in the
6.16 load.
We haven't yet bottomed out the root cause.
Any suggestions or assistance would be appreciated.
Thanks,
Chris
Powered by blists - more mailing lists