lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXuZysNQCVrfFzx7@gpd4>
Date: Thu, 29 Jan 2026 18:32:58 +0100
From: Andrea Righi <arighi@...dia.com>
To: gmonaco@...hat.com
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>,
	Joel Fernandes <joelagnelf@...dia.com>,
	David Vernet <void@...ifault.com>,
	Changwoo Min <changwoo@...lia.com>,
	Daniel Hodges <hodgesd@...a.com>, sched-ext@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched/deadline: Reset dl_server execution state on
 stop

Hi Gabriele,

On Thu, Jan 29, 2026 at 12:48:35PM +0100, gmonaco@...hat.com wrote:
> On Wed, 2026-01-28 at 14:41 +0100, Andrea Righi wrote:
> > Just to make sure we're testing the same thing, I'm currently using
> > https://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git,
> > branch
> > scx-dl-server.
> > 
> > I'm running this test inside virtme-ng:
> >   $ vng -vb --config tools/testing/selftests/sched_ext/config
> >   $ vng -v -- tools/testing/selftests/sched_ext/runner -t rt_stall
> 
> Well, that's a fun one, I could reproduce the same failure you
> described in vng on another x86 box.
> 
> The arm box (bare metal) I used initially still passes just fine all 4
> iterations of the test.
> 
> 
> On the x86 box (vng) I tried different orders of iterations (where the
> original is fair-ext-fair-ext) with and without the ext server active.
> 
> No ext-server: the ext iteration fails and breaks also fair (unlike the
> arm64 box where the fair was intact)
> ext-server active: a sequence fair-ext breaks both (like you observe).
> 
> I don't have time to look further into this right now, but it looks
> like an interesting pattern.

Thanks for checking and reproducing it.

Considering that these issues around DL server stop/start transitions can
be triggered introducing an additional DL server (EXT) makes me wonder
whether this could become even more problematic as we add more DL servers
(hierarchical DL servers?).

Considering that unconditionally clearing dl_defer_running in
dl_server_stop() seems to re-establish a clear state-machine workflow,
I think we should go with that fix for now, so we can unblock the EXT DL
server patch set. With that change in place, all the server combinations
and sequences I've tested seem to behave consistently.

We can always revisit preserving the short-sleep optimization later if we
find a way to do it with stronger guarantees (and I'll keep investigating
on this), but for now the unconditional reset seems like the most robust
fix to me.

Opinions? Peter / Juri?

Thanks,
-Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ