linux-kernel - Re: [PATCH v2] sched: async unthrottling for cfs bandwidth

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABk29Ns1VWEVRYENud4CW3JQPrcr79i_F2PBTANqt3t-LaYCfQ@mail.gmail.com>
Date:   Tue, 1 Nov 2022 12:11:30 -0700
From:   Josh Don <joshdon@...gle.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH v2] sched: async unthrottling for cfs bandwidth

On Mon, Oct 31, 2022 at 6:46 PM Tejun Heo <tj@...nel.org> wrote:
>
> On Mon, Oct 31, 2022 at 06:01:19PM -0700, Josh Don wrote:
> > > Yeah, especially with narrow cpuset (or task cpu affinity) configurations,
> > > it can get pretty bad. Outside that tho, at least I haven't seen a lot of
> > > problematic cases as long as the low priority one isn't tightly entangled
> > > with high priority tasks, mostly because 1. if the resource the low pri one
> > > is holding affects large part of the system, the problem is self-solving as
> > > the system quickly runs out of other things to do 2. if the resource isn't
> > > affecting large part of the system, their blast radius is usually reasonably
> > > confined to things tightly coupled with it. I'm sure there are exceptions
> > > and we definitely wanna improve the situation where it makes sense.
> >
> > cgroup_mutex and kernfs rwsem beg to differ :) These are shared with
> > control plane threads, so it is pretty easy to starve those out even
> > while the system has plenty of work to do.
>
> Hahaha yeah, good point. We definitely wanna improve them. There were some
> efforts to improve kernfs locking granularity earlier this year. It was
> promising but didn't get to the finish line. cgroup_mutex, w/ cgroup2 and
> especially with the optimizations around CLONE_INTO_CGROUP, we avoid that in
> most hot paths and hopefully that should help quite a bit. If it continues
> to be a problem, we definitely wanna further improve it.
>
> Just to better understand the situation, can you give some more details on
> the scenarios where cgroup_mutex was in the middle of a shitshow?

There have been a couple, I think one of the main ones has been writes
to cgroup.procs. cpuset modifications also show up since there's a
mutex there.

>
> Thanks.
>
> --
> tejun