lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <87254ef1-fa58-4747-b2e1-5c85ecde15bf@windriver.com>
Date: Thu, 4 Sep 2025 10:33:20 -0600
From: Chris Friesen <chris.friesen@...driver.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: osandov@...com, Peter Zijlstra <peterz@...radead.org>
Subject: sched: observed instability under stress in 6.12 and mainline

Hi,

I'd like to draw the attention of the scheduler maintainers to a number 
of kernel bugzilla reports submitted by a colleague a couple of weeks ago:

6.12.18:
https://bugzilla.kernel.org/show_bug.cgi?id=220447
https://bugzilla.kernel.org/show_bug.cgi?id=220448

v6.16-rt3
https://bugzilla.kernel.org/show_bug.cgi?id=220450
https://bugzilla.kernel.org/show_bug.cgi?id=220449

There seems to be something wrong with either the logic or the locking. 
In one case this resulted in a NULL pointer dereference in 
pick_next_entity().  In another case it resulted in 
BUG_ON(!rq->nr_running) in dequeue_top_rt_rq() and 
SCHED_WARN_ON(!se->on_rq) in update_entity_lag().

My colleague suggests that the NULL pointer dereference may be due to 
pick_eevdf() returning NULL in pick_next_entity().

I did some digging and found that 
https://gitlab.com/linux-kernel/stable/-/commit/86b37810 would not have 
been included in 6.12.18, but the equivalent fix should have been in the 
6.16 load.

We haven't yet bottomed out the root cause.

Any suggestions or assistance would be appreciated.

Thanks,
Chris


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ