[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQzWM4vv30etfhok@gpd4>
Date: Thu, 6 Nov 2025 18:09:07 +0100
From: Andrea Righi <arighi@...dia.com>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>,
David Vernet <void@...ifault.com>,
Changwoo Min <changwoo@...lia.com>, Shuah Khan <shuah@...nel.org>,
Joel Fernandes <joelagnelf@...dia.com>,
Christian Loehle <christian.loehle@....com>,
Emil Tsalapatis <emil@...alapatis.com>,
Luigi De Matteis <ldematteis123@...il.com>,
sched-ext@...ts.linux.dev, bpf@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 04/11] sched/deadline: Add support to initialize and
remove dl_server bandwidth
On Thu, Nov 06, 2025 at 10:49:20AM +0100, Juri Lelli wrote:
> Hi,
>
> On 29/10/25 20:08, Andrea Righi wrote:
> > During switching from sched_ext to fair tasks and vice-versa, we need
> > support for intializing and removing the bandwidth contribution of
> > either DL server.
>
> My first and more general/design question is do we strictly need this
> automagic bandwidth management. We seem to agree [1] that we want to
> move towards explicit dl-server(s) and tasks bandwidth handling, so we
> might want to consider leaving the burden completely to whomever might
> be configuring the system.
I think we decided to take this approach because, once a sched_ext
scheduler is loaded and all tasks are moved to the ext class, the fair
class becomes "empty", but the fair dl-server would still keep its
bandwidth reserved, so somehow we need to release that reservation,
right?
>
> > Add support for handling these transitions.
>
> Anyway, if we still want to do this :) ...
>
> > Moreover, remove references specific to the fair server, in preparation
> > for adding the ext server.
> >
> > v2: - wait for inactive_task_timer to fire before removing the bandwidth
> > reservation (Juri Lelli)
> > - add WARN_ON_ONCE(!cpus) sanity check in dl_server_apply_params()
> > (Andrea Righi)
> >
> > Co-developed-by: Joel Fernandes <joelagnelf@...dia.com>
> > Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
> > Signed-off-by: Andrea Righi <arighi@...dia.com>
> > ---
>
> ...
>
> > +/**
> > + * dl_server_remove_params - Remove bandwidth reservation for a DL server
> > + * @dl_se: The DL server entity to remove bandwidth for
> > + *
> > + * This function removes the bandwidth reservation for a DL server entity,
> > + * cleaning up all bandwidth accounting and server state.
> > + *
> > + * Returns: 0 on success, negative error code on failure
> > + */
> > +int dl_server_remove_params(struct sched_dl_entity *dl_se,
> > + struct rq *rq, struct rq_flags *rf)
> > +{
> > + if (!dl_se->dl_server)
> > + return 0; /* Already disabled */
> > +
> > + /*
> > + * First dequeue if still queued. It should not be queued since
> > + * we call this only after the last dl_server_stop().
> > + */
> > + if (WARN_ON_ONCE(on_dl_rq(dl_se)))
> > + dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);
> > +
> > + if (hrtimer_try_to_cancel(&dl_se->inactive_timer) == -1) {
> > + rq_unlock_irqrestore(rq, rf);
>
> This seems racy. I fear the moment we release the rq lock something can
> slip in and the server(s) state might change?
Yeah, however, sched_ext is the only user of dl_server_remove_params() (at
least for now), this is called when an scx scheduler is started/stopped and
concurrent starts/stops can't happen, so maybe we're safe...? And in that
case it'd be helpful to document this scenario clearly, unless we want to
rethink this part and avoid cancelling the timer.
>
> > +
> > + hrtimer_cancel(&dl_se->inactive_timer);
>
> I am not sure we actually need to force cancel the timer (but still
> contradicting myself every time I go back at staring at code :). The way
> I believe this should work 'in theory' is
>
> - we remove a server (either automagic or user sets runtime to 0 -
> which is probably to fix/look at in current implementation as well
> btw)
> - current bandwidth is retained and only freed (and server reset) at
> 0-lag (when inactive_timer fires)
> - if server is activated back before 0-lag it will use it's current
> parameters
> - after 0-lag it's a new instance with new parameters
Hm... that means just setting the runtime to 0 IIUC. I think I tried that
approach in the past, but I was seeing some inconsistencies with the
total_bw kselftest, starting/stopping an scx scheduler multiple times
seemed to gradually consume all the available bandwidth.
But I can give it another try, maybe that behavior was caused by other
issues, since we've fixed quite a few things since then.
>
> In inactive_timer() we have this behavior for simple tasks, but we skip
> __dl_sub() etc for servers (since we clear it up immediately).
>
> In all this I essentially fear that if we clear parameters immediately
> one could be able to trick the system by quickly disabling/enabling a
> dl-server to let fair/scx tasks execute more than what requested (as
> each new enable will be seen as a new instance). But, again, I wasn't
> yet able to demonstrate this and I am still uncomfortably uncertain.
> Please Peter and others keep me honest.
>
> Also, server parameters changes are root only, so maybe not a big deal?
> For scx automagic as well?
Only root can start/stop scx schedulers, or better whoever has the CAP_BPF
or CAP_SYS_ADMIN capabilities. There's also some natural throttling in the
rate of scx restarts you can do, due to the overhead of switching to scx
bypass mode, re-enqueue all the tasks, and so on.
Maybe we could even explicitly throttle the rate of scx restarts if it
raises concerns from a security or stability standpoint.
Thanks,
-Andrea
Powered by blists - more mailing lists