linux-kernel - Re: [PATCH] cgroup: wait for css offline when rmdir

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YpZOIz/CbQs+aWF6@slm.duckdns.org>
Date:   Tue, 31 May 2022 07:19:31 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Hongchen Zhang <zhanghongchen@...ngson.cn>
Cc:     Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cgroup: wait for css offline when rmdir

Hello,

On Tue, May 31, 2022 at 11:49:53AM +0800, Hongchen Zhang wrote:
> Yes, the problem would disappear when add some reasonable delay. But I think

It'd be better to wait for some group of operations to complete than
inserting explicit delays.

> if we can increase the MEM_CGROUP_ID_MAX to INT_MAX.Thus the -ENOMEM error
> would be never occured,even if the system is out of memory.

Oh, you're hitting the memcg ID limit, not the css one. Memcg id is limited
so that it doesn't consume as many bits in, I guess, struct page. I don't
think it'd make sense to increase overall overhead to solve this rather
artificial problem tho.

Maybe just keep the sequence numbers for started and completed offline
operations and wait for completed# to reach the started# on memcg alloc
failure and retry? Note that we can get live locked, so have to remember the
sequence number to wait for at the beginning. Or, even simpler, maybe it'd
be enough to just do synchronize_rcu() and then wait for the offline wait
once and retry.

Thanks.

-- 
tejun