lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cc33e6e3-3ffe-4ebc-a2e3-7dd7afe13538@redhat.com>
Date: Mon, 26 Jan 2026 16:56:52 +0000
From: Gabriele Monaco <gmonaco@...hat.com>
To: Andrea Righi <arighi@...dia.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>,
	Joel Fernandes <joelagnelf@...dia.com>,
	David Vernet <void@...ifault.com>,
	Changwoo Min <changwoo@...lia.com>, Daniel Hodges <hodgesd@...a.com>,
	sched-ext@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched/deadline: Reset dl_server execution state on
 stop

2026-01-26T16:30:45Z Andrea Righi <arighi@...dia.com>:

> Hi Gabriele,
>
> On Mon, Jan 26, 2026 at 03:20:12PM +0100, Gabriele Monaco wrote:

>> In the sequence you described above, I wonder why the enqueue is never
>> replenishing. As far as I understand the runtime should remain <= 0 only as long
>> as the enqueue occurs before the deadline, after that it should simply replenish
>> a new period (pushing deadline and restoring runtime).
>>
>> What am I missing here?
>
> Replenishment is not triggered directly by enqueueing, but by the
> deferral/replenishment timer. In this case the timer is never armed: stale
> dl_defer_running makes the enqueue path believe the server is already in
> the running phase, which suppresses deferral arming, causing
> start_dl_timer() to be skipped.
>

Hi Andrea,

thanks for the clarification, but I think I observed the enqueue/dl_server_start replenishing a new period when running.

Something like:
dl_server_start()
  enqueue_dl_entity(ENQUEUE_WAKEUP)
    update_dl_entity()
      replenish_dl_new_period()

should happen if the deadline is in the past, unless I'm missing some condition down the road.

Still if it starts before the deadline, the server is going to get throttled as you observed, and perhaps since in your tests the CPU isn't idle, we don't stop the server after that dequeue and then we never replenish after the deadline (because we never start and as you mentioned, the timer is not armed).

Can this be what you're observing?

Thanks,
Gabriele


> Thanks,
> -Andrea
>
>>
>> Thanks,
>> Gabriele
>>
>> [1] -
>> https://lore.kernel.org/lkml/20251111111716.GL278048@noisy.programming.kicks-ass.net
>>
>>>
>>> This results in starvation of the tasks serviced by the deadline server
>>> in the presence of competing RT workloads.
>>>
>>> This issue can be confirmed adding debugging traces, which show that the
>>> server skips the deferral timer and is immediately throttled upon
>>> execution with negative runtime:
>>>
>>>  DEBUG: dl_server_start: dl_defer_running=1 active=0
>>>  DEBUG: enqueue_dl_entity: flags=1 dl_throttled=0 dl_defer=1
>>>  DEBUG: update_dl_entity: dl_defer_running=1
>>>  DEBUG: enqueue_dl_entity: SKIPPING start_dl_timer! dl_throttled=0
>>>  ...
>>>  DEBUG: update_curr_dl_se: THROTTLED runtime=-954758
>>>
>>> Fix this by properly resetting dl_defer_running in dl_server_stop(),
>>> ensuring the server correctly enters the defer phase upon restart.
>>>
>>> This issue is quite difficult to observe when only the fair server
>>> is present, as the required stop/start patterns are relatively rare.
>>> However, it becomes easier to trigger with an additional deadline server
>>> with more frequent server lifecycle transitions (such as a sched_ext
>>> deadline server).
>>>
>>> This change is a prerequisite for introducing a sched_ext deadline
>>> server, as it ensures correct and predictable behavior across server
>>> stop/start cycles.
>>>
>>> Link: https://lore.kernel.org/all/aXEMat4IoNnGYgxw@gpd4/
>>> Signed-off-by: Andrea Righi <arighi@...dia.com>
>>> ---
>>> Changes in v2:
>>>  - Update state machine documentation
>>>  - Link to v1:
>>> https://lore.kernel.org/all/20260122140833.1655020-1-arighi@nvidia.com/
>>>
>>>  kernel/sched/deadline.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>>> index c509f2e7d69de..e42867061ea77 100644
>>> --- a/kernel/sched/deadline.c
>>> +++ b/kernel/sched/deadline.c
>>> @@ -1615,7 +1615,7 @@ void dl_server_update(struct sched_dl_entity *dl_se, s64
>>> delta_exec)
>>>   *   dl_server_active = 0
>>>   *   dl_throttled = 0
>>>   *   dl_defer_armed = 0
>>> - *   dl_defer_running = 0/1
>>> + *   dl_defer_running = 0
>>>   *   dl_defer_idle = 0
>>>   *
>>>   * [B] - zero_laxity-wait
>>> @@ -1704,6 +1704,7 @@ void dl_server_update(struct sched_dl_entity *dl_se, s64
>>> delta_exec)
>>>   *       hrtimer_try_to_cancel();
>>>   *       dl_defer_armed = 0;
>>>   *       dl_throttled = 0;
>>> + *       dl_defer_running = 0;
>>>   *       dl_server_active = 0;
>>>   *       // [A]
>>>   *   return p;
>>> @@ -1813,6 +1814,7 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
>>>     hrtimer_try_to_cancel(&dl_se->dl_timer);
>>>     dl_se->dl_defer_armed = 0;
>>>     dl_se->dl_throttled = 0;
>>> +   dl_se->dl_defer_running = 0;
>>>     dl_se->dl_defer_idle = 0;
>>>     dl_se->dl_server_active = 0;
>>>  }
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ