lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83a5971ef07226737421737f889795ec57b6fa6c.camel@redhat.com>
Date: Tue, 14 Oct 2025 17:32:19 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>, Ingo
 Molnar <mingo@...hat.com>, Clark Williams <williams@...hat.com>
Subject: Re: [RFC PATCH] sched/deadline: Avoid dl_server boosting with
 expired deadline

On Tue, 2025-10-14 at 12:25 +0200, Peter Zijlstra wrote:
> 
> Lets be confused together :-)
> 
> So dl_server is active, but machine is otherwise idle, this means
> dl_server_timer is pending, right?

It may not be, as far as I see from the trace, the timer expires at the last
replenish before this "error" and is only restarted a while after, when the
boosted task is throttled by a tick.

> 
> This timer is in one of two states:
> 
>  - waiting for replenish; which will trigger and switch to 0-laxity.
>  - waiting for 0-laxity
> 
> So that 0-laxity case is the interesting one; when the machine really is
> idle, no fair tasks will run and its runtime budget will not get
> depleted. Therefore, once we hit 0-laxity, it will do
> enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH).
> 
> This enqueue should ensure dl_se->deadline is in the future, right?

Yes, this enqueue replenishes (as I can see in the trace), but that doesn't re-
start the timer. The server gets to replenish_dl_entity with dl_defer_armed,
toggles that and doesn't start a timer (should it?).

> Anyway, we run this deadline entity (there ain't nothing else to do
> anyway), and it finds there aren't any fair tasks, it does
> dl_server_stop().

As far as I see, we do reschedule when enqueueing this server entity, but we
don't stop the server (should we though?).
I'm going to gather some more traces to understand what's happening in there.

Here is a trace where you see the schedule just after that replenish before the
error, but no server stop in there (we have tracepoints so you'd see it):

      <idle>-0    d.h3.  13.347981: (+3)     sched_dl_replenish:   comm=server pid=-13 runtime=50000000 deadline=14270340997 yielded=0
      <idle>-0    .N.2.  13.348043: (+62)    sched_entry:          without preemption
      <idle>-0    ...2.  13.348048: (+5)     sched_exit:           without switch
      <idle>-0    .N.2.  14.942485: (+1594437) sched_entry:          without preemption
      <idle>-0    dN.2.  14.942498: (+13)    bprint:               pick_task_dl: Server picked ksoftirqd/13-126 (runtime 0)
      <idle>-0    d..3.  14.942519: (+21)    event_nomiss:         -13: ready x sched_switch_in -> running
      <idle>-0    d..2.  14.942521: (+2)     sched_switch:         swapper/13:0 [120] R ==> ksoftirqd/13:126 [120]
ksoftirqd/13-126  ...2.  14.942528: (+7)     sched_exit:           with switch
ksoftirqd/13-126  ...2.  14.942566: (+38)    sched_entry:          without preemption
ksoftirqd/13-126  d..3.  14.942588: (+22)    error_env_nomiss:     -13: event dl_throttle not expected in the state running with env clk=593612020
ksoftirqd/13-126  d..3.  14.942592: (+4)     sched_dl_throttle:    comm=server pid=-13 runtime=-92390 deadline=14270340997 yielded=0
ksoftirqd/13-126  d..3.  14.942601: (+9)     sched_dl_replenish:   comm=server pid=-13 runtime=50000000 deadline=15864951976 yielded=0
ksoftirqd/13-126  d..2.  14.942623: (+22)    sched_switch:         ksoftirqd/13:126 [120] S ==> rcuc/13:124 [98]
     rcuc/13-124  ...2.  14.942628: (+5)     sched_exit:           with switch

> Then, if a fair takes wakes (nr_running: 0->1) and dl_server isn't
> active, we do dl_server_start(), which in turn does enqueue_dl_entity().
> Now this enqueue is supposed to check if the dl_entity can still run;
> does it still have time left in its current period, if not, its
> replenish timer time.
> 
> 
> So where exactly does the fair task start, and result in dl_se being
> on_rq such that dl_deadline is in the past? That sounds like an enqueue
> problem to me.
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ