[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83a5971ef07226737421737f889795ec57b6fa6c.camel@redhat.com>
Date: Tue, 14 Oct 2025 17:32:19 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>, Ingo
Molnar <mingo@...hat.com>, Clark Williams <williams@...hat.com>
Subject: Re: [RFC PATCH] sched/deadline: Avoid dl_server boosting with
expired deadline
On Tue, 2025-10-14 at 12:25 +0200, Peter Zijlstra wrote:
>
> Lets be confused together :-)
>
> So dl_server is active, but machine is otherwise idle, this means
> dl_server_timer is pending, right?
It may not be, as far as I see from the trace, the timer expires at the last
replenish before this "error" and is only restarted a while after, when the
boosted task is throttled by a tick.
>
> This timer is in one of two states:
>
> - waiting for replenish; which will trigger and switch to 0-laxity.
> - waiting for 0-laxity
>
> So that 0-laxity case is the interesting one; when the machine really is
> idle, no fair tasks will run and its runtime budget will not get
> depleted. Therefore, once we hit 0-laxity, it will do
> enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH).
>
> This enqueue should ensure dl_se->deadline is in the future, right?
Yes, this enqueue replenishes (as I can see in the trace), but that doesn't re-
start the timer. The server gets to replenish_dl_entity with dl_defer_armed,
toggles that and doesn't start a timer (should it?).
> Anyway, we run this deadline entity (there ain't nothing else to do
> anyway), and it finds there aren't any fair tasks, it does
> dl_server_stop().
As far as I see, we do reschedule when enqueueing this server entity, but we
don't stop the server (should we though?).
I'm going to gather some more traces to understand what's happening in there.
Here is a trace where you see the schedule just after that replenish before the
error, but no server stop in there (we have tracepoints so you'd see it):
<idle>-0 d.h3. 13.347981: (+3) sched_dl_replenish: comm=server pid=-13 runtime=50000000 deadline=14270340997 yielded=0
<idle>-0 .N.2. 13.348043: (+62) sched_entry: without preemption
<idle>-0 ...2. 13.348048: (+5) sched_exit: without switch
<idle>-0 .N.2. 14.942485: (+1594437) sched_entry: without preemption
<idle>-0 dN.2. 14.942498: (+13) bprint: pick_task_dl: Server picked ksoftirqd/13-126 (runtime 0)
<idle>-0 d..3. 14.942519: (+21) event_nomiss: -13: ready x sched_switch_in -> running
<idle>-0 d..2. 14.942521: (+2) sched_switch: swapper/13:0 [120] R ==> ksoftirqd/13:126 [120]
ksoftirqd/13-126 ...2. 14.942528: (+7) sched_exit: with switch
ksoftirqd/13-126 ...2. 14.942566: (+38) sched_entry: without preemption
ksoftirqd/13-126 d..3. 14.942588: (+22) error_env_nomiss: -13: event dl_throttle not expected in the state running with env clk=593612020
ksoftirqd/13-126 d..3. 14.942592: (+4) sched_dl_throttle: comm=server pid=-13 runtime=-92390 deadline=14270340997 yielded=0
ksoftirqd/13-126 d..3. 14.942601: (+9) sched_dl_replenish: comm=server pid=-13 runtime=50000000 deadline=15864951976 yielded=0
ksoftirqd/13-126 d..2. 14.942623: (+22) sched_switch: ksoftirqd/13:126 [120] S ==> rcuc/13:124 [98]
rcuc/13-124 ...2. 14.942628: (+5) sched_exit: with switch
> Then, if a fair takes wakes (nr_running: 0->1) and dl_server isn't
> active, we do dl_server_start(), which in turn does enqueue_dl_entity().
> Now this enqueue is supposed to check if the dl_entity can still run;
> does it still have time left in its current period, if not, its
> replenish timer time.
>
>
> So where exactly does the fair task start, and result in dl_se being
> on_rq such that dl_deadline is in the past? That sounds like an enqueue
> problem to me.
>
>
Powered by blists - more mailing lists