[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xm26a5i7zi7g.fsf@google.com>
Date: Tue, 23 Jul 2024 18:34:11 -0700
From: Benjamin Segall <bsegall@...gle.com>
To: Valentin Schneider <vschneid@...hat.com>
Cc: linux-kernel@...r.kernel.org, rcu@...r.kernel.org, Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, Juri Lelli
<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>, Phil Auld
<pauld@...hat.com>, Clark Williams <williams@...hat.com>, Tomas Glozar
<tglozar@...hat.com>, "Paul E. McKenney" <paulmck@...nel.org>, Frederic
Weisbecker <frederic@...nel.org>, Neeraj Upadhyay
<neeraj.upadhyay@...nel.org>, Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>, Boqun Feng
<boqun.feng@...il.com>, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com>, Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang1211@...il.com>, Alexander Gordeev
<agordeev@...ux.ibm.com>, Catalin Marinas <catalin.marinas@....com>,
Arnd Bergmann <arnd@...db.de>, Guo Ren <guoren@...nel.org>, Palmer
Dabbelt <palmer@...osinc.com>, Andrew Morton <akpm@...ux-foundation.org>,
Oleg Nesterov <oleg@...hat.com>, Jens Axboe <axboe@...nel.dk>
Subject: Re: [RFC PATCH v3 10/10] sched/fair: Throttle CFS tasks on return
to userspace
Valentin Schneider <vschneid@...hat.com> writes:
> On 18/07/24 17:25, Benjamin Segall wrote:
>> Valentin Schneider <vschneid@...hat.com> writes:
>>
>>> I've tested a 10ms runtime / 100ms period cgroup with an always running
>>> task: upstream gets a "clean" periodic pattern of 10ms runtime every 100ms,
>>> whereas this gets something more like 40ms runtime every 400ms.
>>
>> Hmm, this seems a little odd since TWA_RESUME does a kick_process.
>
> I didn't ponder too much on the workload used here, but the point I wanted
> to bring up is: if you give a cgroup X amount of runtime, it may still
> consume more than that within a single period because execution in
> kernelspace isn't immediately stopped/throttled.
>
> It means the "standard" bandwidth control behaviour becomes a bit more
> bursty.
Yeah, more bursty behavior when doing cpu-burning syscalls is expected.
With the check on exit to user I wouldn't expect anything worse than the
duration of the syscall though, so it depends on what your test was.
>>> +
>>> + /*
>>> + * Account tasks woken up in children; by this point all direct children
>>> + * have been visited.
>>> + */
>>> + task_delta += cfs_rq->unthrottled_h_nr_running;
>>> + idle_task_delta += cfs_rq->unthrottled_idle_h_nr_running;
>>> +
>>> + cfs_rq->h_nr_running += task_delta;
>>> + cfs_rq->idle_h_nr_running += idle_task_delta;
>>> +
>>> + /*
>>> + * unthrottle_cfs_rq() needs a value to up-propagate above the
>>> + * freshly unthrottled cfs_rq.
>>> + */
>>> + cfs_rq->unthrottled_h_nr_running = task_delta;
>>> + cfs_rq->unthrottled_idle_h_nr_running = idle_task_delta;
>>
>> I think this should have no effect, right?
>
> Hm so my thoughts here are:
> The walk_tg_tree_from(tg_unthrottle_up) will update *nr_running counts up
> to the cfs_rq->tg->se[cpu_of(rq)]. However if that cfs_rq isn't the root
> one, we'll need the for_each_sched_entity() loop further down
> unthrottle_cfs_rq() to update the upper part of the hierarchy. The values
> that will be up-propagated there are the ones being saved here.
I'm pretty sure this comment was left over from when I didn't understand
what they were being used for. I'm pretty sure I remember intending to
remove it.
Powered by blists - more mailing lists