linux-kernel - Re: [RFC PATCH] sched/deadline: Avoid dl_server boosting with expired deadline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48ee3f26-7dbc-4c59-b98d-f9aeed980a43@redhat.com>
Date: Sat, 1 Nov 2025 08:43:51 +0000 (UTC)
From: Gabriele Monaco <gmonaco@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...hat.com>, Clark Williams <williams@...hat.com>,
	arighi@...dia.com
Subject: Re: [RFC PATCH] sched/deadline: Avoid dl_server boosting with
 expired deadline

2025-11-01T00:08:37Z Peter Zijlstra <peterz@...radead.org>:

> On Fri, Oct 31, 2025 at 04:41:22PM +0100, Gabriele Monaco wrote:
>> On Fri, 2025-10-31 at 16:20 +0100, Peter Zijlstra wrote:
>>> On Fri, Oct 31, 2025 at 02:24:17PM +0100, Gabriele Monaco wrote:
>>>>
>>>> Different scenario if I have the CPU busy with other tasks (e.g. RT
>>>> policies), there I can see the server stopping and starting again.
>>>> After I do this I seem to get a different behaviour (even some boosting
>>>> after idle), I'm trying to understand what's going on.
>>>>
>>
>> After running some heavy RT workload (stress-ng --cpu 10 --sched rr) I do see
>> the server stopping and starting as the models would expect, but somehow it's
>> always boosting as soon as it's started.
>>
>> Apparently dl_defer_running is always 1 in that scenario. Perhaps running idle
>> counts as running something too, so it never defers. But I can't really see how
>> this happens..
>
> The transition [4], will retain dl_defer_running, such that a timely
> re-start of the dl_server can immediately run again.

Alright I worded it poorly. As far as I understand, what you mentioned is desired behaviour when handling starvation. We don't defer and start the next period boosting.
What I was observing was the server staying running indefinitely.
I run a test with 5s of RR stress-ng and 30s of mostly idle DL workload on a clean VM. I expect boosting only during the first 5 seconds, but I see it also after, where there was clearly no starvation (system was idle, probably a bit hard to see from the trace I shared).

Thanks for the updated patch, I'll try that and see how it goes.

Gabriele