linux-kernel - Re: [Question] The system may be stuck if there is a cpu cgroup cpu.cfs_quato

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm26czeoioju.fsf@google.com>
Date:   Fri, 01 Jul 2022 13:08:21 -0700
From:   Benjamin Segall <bsegall@...gle.com>
To:     Zhang Qiao <zhangqiao22@...wei.com>
Cc:     Tejun Heo <tj@...nel.org>, <mingo@...hat.com>,
        <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        <lizefan.x@...edance.com>, <hannes@...xchg.org>,
        <cgroups@...r.kernel.org>, lkml <linux-kernel@...r.kernel.org>,
        <vschneid@...hat.com>, <dietmar.eggemann@....com>,
        <bristot@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
        <mgorman@...e.de>
Subject: Re: [Question] The system may be stuck if there is a cpu cgroup
 cpu.cfs_quato_us is very low

Zhang Qiao <zhangqiao22@...wei.com> writes:

> Hi, tejun
>
> Thanks for your reply.
>
> 在 2022/6/27 16:32, Tejun Heo 写道:
>> Hello,
>> 
>> On Mon, Jun 27, 2022 at 02:50:25PM +0800, Zhang Qiao wrote:
>>> Becuase the task cgroup's cpu.cfs_quota_us is very small and
>>> test_fork's load is very heavy, the test_fork may be throttled long
>>> time, therefore, the cgroup_threadgroup_rw_sem read lock is held for
>>> a long time, other processes will get stuck waiting for the lock:
>> 
>> Yeah, this is a known problem and can happen with other locks too. The
>> solution prolly is only throttling while in or when about to return to
>> userspace. There is one really important and wide-spread assumption in
>> the kernel:
>> 
>>   If things get blocked on some shared resource, whatever is holding
>>   the resource ends up using more of the system to exit the critical
>>   section faster and thus unblocks others ASAP. IOW, things running in
>>   kernel are work-conserving.
>> 
>> The cpu bw controller gives the userspace a rather easy way to break
>> this assumption and thus is rather fundamentally broken. This is
>> basically the same problem we had with the old cgroup freezer
>> implementation which trapped threads in random locations in the
>> kernel.
>> 
>
> so, if we want to completely slove this problem, is the best way to
> change the cfs bw controller throttle mechanism? for example, throttle
> tasks in a safe location.

Yes, fixing (kernel) priority inversion due to CFS_BANDWIDTH requires a
serious reworking of how it works, because it would need to dequeue
tasks individually rather than doing the entire cfs_rq at a time (and
would require some effort to avoid pinging every throttling task to get
it into the kernel).