[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aO5zxvoCPNfWwfoK@jlelli-thinkpadt14gen4.remote.csb>
Date: Tue, 14 Oct 2025 18:01:10 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>,
Clark Williams <williams@...hat.com>
Subject: Re: [RFC PATCH] sched/deadline: Avoid dl_server boosting with
expired deadline
On 14/10/25 17:32, Gabriele Monaco wrote:
> On Tue, 2025-10-14 at 12:25 +0200, Peter Zijlstra wrote:
> >
> > Lets be confused together :-)
> >
> > So dl_server is active, but machine is otherwise idle, this means
> > dl_server_timer is pending, right?
>
> It may not be, as far as I see from the trace, the timer expires at the last
> replenish before this "error" and is only restarted a while after, when the
> boosted task is throttled by a tick.
>
> >
> > This timer is in one of two states:
> >
> > - waiting for replenish; which will trigger and switch to 0-laxity.
> > - waiting for 0-laxity
> >
> > So that 0-laxity case is the interesting one; when the machine really is
> > idle, no fair tasks will run and its runtime budget will not get
> > depleted. Therefore, once we hit 0-laxity, it will do
> > enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH).
Shouldn't idle time be accounted (subtracted from runtime) as well, though?
> > This enqueue should ensure dl_se->deadline is in the future, right?
>
> Yes, this enqueue replenishes (as I can see in the trace), but that doesn't re-
> start the timer. The server gets to replenish_dl_entity with dl_defer_armed,
> toggles that and doesn't start a timer (should it?).
>
> > Anyway, we run this deadline entity (there ain't nothing else to do
> > anyway), and it finds there aren't any fair tasks, it does
> > dl_server_stop().
>
> As far as I see, we do reschedule when enqueueing this server entity, but we
> don't stop the server (should we though?).
> I'm going to gather some more traces to understand what's happening in there.
>
> Here is a trace where you see the schedule just after that replenish before the
> error, but no server stop in there (we have tracepoints so you'd see it):
>
> <idle>-0 d.h3. 13.347981: (+3) sched_dl_replenish: comm=server pid=-13 runtime=50000000 deadline=14270340997 yielded=0
> <idle>-0 .N.2. 13.348043: (+62) sched_entry: without preemption
> <idle>-0 ...2. 13.348048: (+5) sched_exit: without switch
> <idle>-0 .N.2. 14.942485: (+1594437) sched_entry: without preemption
> <idle>-0 dN.2. 14.942498: (+13) bprint: pick_task_dl: Server picked ksoftirqd/13-126 (runtime 0)
> <idle>-0 d..3. 14.942519: (+21) event_nomiss: -13: ready x sched_switch_in -> running
> <idle>-0 d..2. 14.942521: (+2) sched_switch: swapper/13:0 [120] R ==> ksoftirqd/13:126 [120]
> ksoftirqd/13-126 ...2. 14.942528: (+7) sched_exit: with switch
> ksoftirqd/13-126 ...2. 14.942566: (+38) sched_entry: without preemption
> ksoftirqd/13-126 d..3. 14.942588: (+22) error_env_nomiss: -13: event dl_throttle not expected in the state running with env clk=593612020
> ksoftirqd/13-126 d..3. 14.942592: (+4) sched_dl_throttle: comm=server pid=-13 runtime=-92390 deadline=14270340997 yielded=0
> ksoftirqd/13-126 d..3. 14.942601: (+9) sched_dl_replenish: comm=server pid=-13 runtime=50000000 deadline=15864951976 yielded=0
> ksoftirqd/13-126 d..2. 14.942623: (+22) sched_switch: ksoftirqd/13:126 [120] S ==> rcuc/13:124 [98]
> rcuc/13-124 ...2. 14.942628: (+5) sched_exit: with switch
>
> > Then, if a fair takes wakes (nr_running: 0->1) and dl_server isn't
> > active, we do dl_server_start(), which in turn does enqueue_dl_entity().
> > Now this enqueue is supposed to check if the dl_entity can still run;
> > does it still have time left in its current period, if not, its
> > replenish timer time.
> >
> >
> > So where exactly does the fair task start, and result in dl_se being
> > on_rq such that dl_deadline is in the past? That sounds like an enqueue
> > problem to me.
> >
> >
>
Powered by blists - more mailing lists