lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aO5zxvoCPNfWwfoK@jlelli-thinkpadt14gen4.remote.csb>
Date: Tue, 14 Oct 2025 18:01:10 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...hat.com>,
	Clark Williams <williams@...hat.com>
Subject: Re: [RFC PATCH] sched/deadline: Avoid dl_server boosting with
 expired deadline

On 14/10/25 17:32, Gabriele Monaco wrote:
> On Tue, 2025-10-14 at 12:25 +0200, Peter Zijlstra wrote:
> > 
> > Lets be confused together :-)
> > 
> > So dl_server is active, but machine is otherwise idle, this means
> > dl_server_timer is pending, right?
> 
> It may not be, as far as I see from the trace, the timer expires at the last
> replenish before this "error" and is only restarted a while after, when the
> boosted task is throttled by a tick.
> 
> > 
> > This timer is in one of two states:
> > 
> >  - waiting for replenish; which will trigger and switch to 0-laxity.
> >  - waiting for 0-laxity
> > 
> > So that 0-laxity case is the interesting one; when the machine really is
> > idle, no fair tasks will run and its runtime budget will not get
> > depleted. Therefore, once we hit 0-laxity, it will do
> > enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH).

Shouldn't idle time be accounted (subtracted from runtime) as well, though?

> > This enqueue should ensure dl_se->deadline is in the future, right?
> 
> Yes, this enqueue replenishes (as I can see in the trace), but that doesn't re-
> start the timer. The server gets to replenish_dl_entity with dl_defer_armed,
> toggles that and doesn't start a timer (should it?).
> 
> > Anyway, we run this deadline entity (there ain't nothing else to do
> > anyway), and it finds there aren't any fair tasks, it does
> > dl_server_stop().
> 
> As far as I see, we do reschedule when enqueueing this server entity, but we
> don't stop the server (should we though?).
> I'm going to gather some more traces to understand what's happening in there.
> 
> Here is a trace where you see the schedule just after that replenish before the
> error, but no server stop in there (we have tracepoints so you'd see it):
> 
>       <idle>-0    d.h3.  13.347981: (+3)     sched_dl_replenish:   comm=server pid=-13 runtime=50000000 deadline=14270340997 yielded=0
>       <idle>-0    .N.2.  13.348043: (+62)    sched_entry:          without preemption
>       <idle>-0    ...2.  13.348048: (+5)     sched_exit:           without switch
>       <idle>-0    .N.2.  14.942485: (+1594437) sched_entry:          without preemption
>       <idle>-0    dN.2.  14.942498: (+13)    bprint:               pick_task_dl: Server picked ksoftirqd/13-126 (runtime 0)
>       <idle>-0    d..3.  14.942519: (+21)    event_nomiss:         -13: ready x sched_switch_in -> running
>       <idle>-0    d..2.  14.942521: (+2)     sched_switch:         swapper/13:0 [120] R ==> ksoftirqd/13:126 [120]
> ksoftirqd/13-126  ...2.  14.942528: (+7)     sched_exit:           with switch
> ksoftirqd/13-126  ...2.  14.942566: (+38)    sched_entry:          without preemption
> ksoftirqd/13-126  d..3.  14.942588: (+22)    error_env_nomiss:     -13: event dl_throttle not expected in the state running with env clk=593612020
> ksoftirqd/13-126  d..3.  14.942592: (+4)     sched_dl_throttle:    comm=server pid=-13 runtime=-92390 deadline=14270340997 yielded=0
> ksoftirqd/13-126  d..3.  14.942601: (+9)     sched_dl_replenish:   comm=server pid=-13 runtime=50000000 deadline=15864951976 yielded=0
> ksoftirqd/13-126  d..2.  14.942623: (+22)    sched_switch:         ksoftirqd/13:126 [120] S ==> rcuc/13:124 [98]
>      rcuc/13-124  ...2.  14.942628: (+5)     sched_exit:           with switch
> 
> > Then, if a fair takes wakes (nr_running: 0->1) and dl_server isn't
> > active, we do dl_server_start(), which in turn does enqueue_dl_entity().
> > Now this enqueue is supposed to check if the dl_entity can still run;
> > does it still have time left in its current period, if not, its
> > replenish timer time.
> > 
> > 
> > So where exactly does the fair task start, and result in dl_se being
> > on_rq such that dl_deadline is in the past? That sounds like an enqueue
> > problem to me.
> > 
> > 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ