linux-kernel - Re: Cgroups "pids" controller does not update "pids.current" count immediately

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d2c7a301-1d89-7019-8fca-3f34c853ce1a@icdsoft.com>
Date:   Fri, 15 Jun 2018 22:38:04 +0300
From:   Ivan Zahariev <famzah@...soft.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Cgroups "pids" controller does not update "pids.current" count
 immediately

Hello,


On 15.6.2018 г. 22:07 ч., Tejun Heo wrote:
> On Fri, Jun 15, 2018 at 08:40:02PM +0300, Ivan Zahariev wrote:
>> The lazy pids accounting + modern fast CPUs makes the "pids.current"
>> metric practically unusable for resource limiting in our case. For a
>> test, when we started and ended one single process very quickly, we
>> saw "pids.current" equal up to 185 (while the correct value at all
>> time is either 0 or 1). If we want that a "cgroup" can spawn maximum
>> 50 processes, we should use some high value like 300 for "pids.max",
>> in order to compensate the pids uncharge lag (and this depends on
>> the speed of the CPU and how busy the system is).
> Yeah, that actually makes a lot of sense.  We can't keep everything
> synchronous for obvious performance reasons but we definitely can wait
> for RCU grace period before failing.  Forking might become a bit
> slower while pids are draining but shouldn't fail and that shouldn't
> incur any performance overhead in normal conditions when pids aren't
> constrained.

I lack expertise to comment on this. As a system administrator, I can 
only remind that nowadays machines with 80+ CPU cores are something 
usual. I don't know how the RCU grace period scales with an increasing 
number of CPUs.

If you develop a patch for this, we can try it in production and give 
you feedback. Just send me an email notification.

Thank you for your time and attention!

--
Ivan