lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <76525b1a-6857-434d-86ee-3c2ff4db0e4c@linux.microsoft.com>
Date:   Wed, 20 Sep 2023 12:21:37 +0200
From:   Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Michal Hocko <mhocko@...e.com>
Cc:     stable@...r.kernel.org, patches@...ts.linux.dev,
        Shakeel Butt <shakeelb@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <muchun.song@...ux.dev>, Tejun Heo <tj@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, regressions@...ts.linux.dev,
        mathieu.tortuyaux@...il.com
Subject: Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop
 kmem.limit_in_bytes

On 9/20/2023 11:25 AM, Greg Kroah-Hartman wrote:
> On Wed, Sep 20, 2023 at 10:43:56AM +0200, Michal Hocko wrote:
>> On Wed 20-09-23 01:11:01, Jeremi Piotrowski wrote:
>>> On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
>>>> 6.1-stable review patch.  If anyone has any objections, please let me know.
>>>>
>>>> ------------------
>>>
>>> Hi Greg/Michal,
>>>
>>> This commit breaks userspace which makes it a bad commit for mainline and an
>>> even worse commit for stable.
>>>
>>> We ingested 6.1.54 into our nightly testing and found that runc fails to gather
>>> cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
>>> into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
>>> fine.
>>
>> Could you expand some more on why is the file read? It doesn't support
>> writing to it for some time so how does reading it helps in any sense?
>>
>> Anyway, I do agree that the stable backport should be reverted.
> 
> That will just postpone the breakage, we really shouldn't break
> userspace.
> 
> That being said, having userspace "break" because a file is no longer
> present is not good coding style on the userspace side at all.  That's
> why we have sysfs and single-value-files now, if the file isn't present,
> then userspace instantly notices and can handle it.  Much easier than
> the old-style multi-fields-in-one-file problem.
> 

The memcg files in this case are single-value, but userspace expects to be able
to read memcg limits when it can read the usage (indicating MEMCG is enabled).
If it can't - then something is off, and the node is marked unhealthy.

>>>> Address this by wiping out the file completely and effectively get back to
>>>> pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
> 
> The fact that this is a valid option (i.e. no file) with that config
> option disabled makes me want to keep this as well, as how does
> userspace handle this option disabled at all?  Or old kernels?
> 

Userspace has had to handle the case of MEMCG_KMEM=n, but that had 2 cases so far:

limits/usage/max_usage/failcnt files are all available or none of them are available.

Now it needs to handle 3 of 4 files being available, but only for kmem (and not plain
memory, memsw or kmem.tcp). That's an inconsistency.

> I can drop this from stable kernels, but again, this feels like the runc
> developers are just postponing the problem...
>

Since cgroups v1 is deprecated, I think the runc developers haven't touched this part
of the code in years and expected it to keep working while they wait for the long tail
of usage to die out.

> thanks,
> 
> greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ