[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xm26czeoioju.fsf@google.com>
Date: Fri, 01 Jul 2022 13:08:21 -0700
From: Benjamin Segall <bsegall@...gle.com>
To: Zhang Qiao <zhangqiao22@...wei.com>
Cc: Tejun Heo <tj@...nel.org>, <mingo@...hat.com>,
<peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
<lizefan.x@...edance.com>, <hannes@...xchg.org>,
<cgroups@...r.kernel.org>, lkml <linux-kernel@...r.kernel.org>,
<vschneid@...hat.com>, <dietmar.eggemann@....com>,
<bristot@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
<mgorman@...e.de>
Subject: Re: [Question] The system may be stuck if there is a cpu cgroup
cpu.cfs_quato_us is very low
Zhang Qiao <zhangqiao22@...wei.com> writes:
> Hi, tejun
>
> Thanks for your reply.
>
> 在 2022/6/27 16:32, Tejun Heo 写道:
>> Hello,
>>
>> On Mon, Jun 27, 2022 at 02:50:25PM +0800, Zhang Qiao wrote:
>>> Becuase the task cgroup's cpu.cfs_quota_us is very small and
>>> test_fork's load is very heavy, the test_fork may be throttled long
>>> time, therefore, the cgroup_threadgroup_rw_sem read lock is held for
>>> a long time, other processes will get stuck waiting for the lock:
>>
>> Yeah, this is a known problem and can happen with other locks too. The
>> solution prolly is only throttling while in or when about to return to
>> userspace. There is one really important and wide-spread assumption in
>> the kernel:
>>
>> If things get blocked on some shared resource, whatever is holding
>> the resource ends up using more of the system to exit the critical
>> section faster and thus unblocks others ASAP. IOW, things running in
>> kernel are work-conserving.
>>
>> The cpu bw controller gives the userspace a rather easy way to break
>> this assumption and thus is rather fundamentally broken. This is
>> basically the same problem we had with the old cgroup freezer
>> implementation which trapped threads in random locations in the
>> kernel.
>>
>
> so, if we want to completely slove this problem, is the best way to
> change the cfs bw controller throttle mechanism? for example, throttle
> tasks in a safe location.
Yes, fixing (kernel) priority inversion due to CFS_BANDWIDTH requires a
serious reworking of how it works, because it would need to dequeue
tasks individually rather than doing the entire cfs_rq at a time (and
would require some effort to avoid pinging every throttling task to get
it into the kernel).
Powered by blists - more mailing lists