linux-kernel - Re: [PATCH v3 1/2] sched, cgroup: Restore meaning to hierarchical

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230718125759.GA126587@lorien.usersys.redhat.com>
Date:   Tue, 18 Jul 2023 08:57:59 -0400
From:   Phil Auld <pauld@...hat.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Valentin Schneider <vschneid@...hat.com>,
        Ben Segall <bsegall@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>,
        Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH v3 1/2] sched, cgroup: Restore meaning to
 hierarchical_quota

On Mon, Jul 17, 2023 at 08:27:24AM -1000 Tejun Heo wrote:
> On Fri, Jul 14, 2023 at 08:57:46AM -0400, Phil Auld wrote:
> > In cgroupv2 cfs_b->hierarchical_quota is set to -1 for all task
> > groups due to the previous fix simply taking the min.  It should
> > reflect a limit imposed at that level or by an ancestor. Even
> > though cgroupv2 does not require child quota to be less than or
> > equal to that of its ancestors the task group will still be
> > constrained by such a quota so this should be shown here. Cgroupv1
> > continues to set this correctly.
> > 
> > In both cases, add initialization when a new task group is created
> > based on the current parent's value (or RUNTIME_INF in the case of
> > root_task_group). Otherwise, the field is wrong until a quota is
> > changed after creation and __cfs_schedulable() is called.
> > 
> > Fixes: c53593e5cb69 ("sched, cgroup: Don't reject lower cpu.max on ancestors")
> 
> Does this really fix anything observable? I wonder whether this is more
> misleading than helpful. In cgroup2, the value simply wasn't being used,
> right?
>

It wasn't being used but was actively being set wrong. I mean if we are
going to bother doing the __cfs_schedulable() tg tree walk we might as
well have not been setting a bogus value. But that said, no it was not
observable until I tried to use it.

I'm fine if that's dropped. I just wanted it set right going forward :)


> > Signed-off-by: Phil Auld <pauld@...hat.com>
> > Reviewed-by: Ben Segall <bsegall@...gle.com>
> > Cc: Ingo Molnar <mingo@...hat.com>
> > Cc: Peter Zijlstra <peterz@...radead.org>
> > Cc: Vincent Guittot <vincent.guittot@...aro.org>
> > Cc: Juri Lelli <juri.lelli@...hat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@....com>
> > Cc: Valentin Schneider <vschneid@...hat.com>
> > Cc: Ben Segall <bsegall@...gle.com>
> > Cc: Frederic Weisbecker <frederic@...nel.org>
> > Cc: Tejun Heo <tj@...nel.org>
> 
> Acked-by: Tejun Heo <tj@...nel.org>
>

Thanks!


> > +		 * always take the non-RUNTIME_INF min.  On cgroup1, only
> > +		 * inherit when no limit is set. In both cases this is used
> > +		 * by the scheduler to determine if a given CFS task has a
> > +		 * bandwidth constraint at some higher level.
> 
> The discussion on this comment is stretching too long and this is fine too
> but what's worth commenting for cgroup2 is that the limit value itself
> doesn't mean anything and we're just latching onto the value used by cgroup1
> to track whether there's any limit active or not.

I thought that was implied by the wording. "If a given task has a bandwidth
contraint" not "what a given task's bandwidth constraint is".  In both cases
that's how the other parts of the scheduler are using it. The actual
non-RUNTIME_INF value only matters in this function (and only for cgroup1
indeed).

But... the value is just as accurate for cgroup2 and cgroup1.  The task is
still going to be limited by that bandwidth constraint even if its own
bandwidth limit is nominally higher, right? 


Cheers,
Phil

> 
> Thanks.
> 
> -- 
> tejun
> 

--