linux-kernel - Re: [Question] The system may be stuck if there is a cpu cgroup cpu.cfs_quato

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YrlrBmF3oOfS3+fq@mtj.duckdns.org>
Date:   Mon, 27 Jun 2022 17:32:06 +0900
From:   Tejun Heo <tj@...nel.org>
To:     Zhang Qiao <zhangqiao22@...wei.com>
Cc:     mingo@...hat.com, peterz@...radead.org,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        lizefan.x@...edance.com, hannes@...xchg.org,
        cgroups@...r.kernel.org, lkml <linux-kernel@...r.kernel.org>,
        vschneid@...hat.com, dietmar.eggemann@....com, bristot@...hat.com,
        bsegall@...gle.com, Steven Rostedt <rostedt@...dmis.org>,
        mgorman@...e.de
Subject: Re: [Question] The system may be stuck if there is a cpu cgroup
 cpu.cfs_quato_us is very low

Hello,

On Mon, Jun 27, 2022 at 02:50:25PM +0800, Zhang Qiao wrote:
> Becuase the task cgroup's cpu.cfs_quota_us is very small and
> test_fork's load is very heavy, the test_fork may be throttled long
> time, therefore, the cgroup_threadgroup_rw_sem read lock is held for
> a long time, other processes will get stuck waiting for the lock:

Yeah, this is a known problem and can happen with other locks too. The
solution prolly is only throttling while in or when about to return to
userspace. There is one really important and wide-spread assumption in
the kernel:

  If things get blocked on some shared resource, whatever is holding
  the resource ends up using more of the system to exit the critical
  section faster and thus unblocks others ASAP. IOW, things running in
  kernel are work-conserving.

The cpu bw controller gives the userspace a rather easy way to break
this assumption and thus is rather fundamentally broken. This is
basically the same problem we had with the old cgroup freezer
implementation which trapped threads in random locations in the
kernel.

So, right now, it's rather broken and can easily be used as an dos
attack vector.

Thanks.

-- 
tejun