lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZRQQMABiVIcXXcrg@dhcp22.suse.cz>
Date:   Wed, 27 Sep 2023 13:21:20 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Nhat Pham <nphamcs@...il.com>
Cc:     akpm@...ux-foundation.org, riel@...riel.com, hannes@...xchg.org,
        roman.gushchin@...ux.dev, shakeelb@...gle.com,
        muchun.song@...ux.dev, tj@...nel.org, lizefan.x@...edance.com,
        shuah@...nel.org, mike.kravetz@...cle.com, yosryahmed@...gle.com,
        linux-mm@...ck.org, kernel-team@...a.com,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH 0/2] hugetlb memcg accounting

On Tue 26-09-23 12:49:47, Nhat Pham wrote:
> Currently, hugetlb memory usage is not acounted for in the memory
> controller, which could lead to memory overprotection for cgroups with
> hugetlb-backed memory. This has been observed in our production system.
> 
> This patch series rectifies this issue by charging the memcg when the
> hugetlb folio is allocated, and uncharging when the folio is freed. In
> addition, a new selftest is added to demonstrate and verify this new
> behavior.

The primary reason why hugetlb is living outside of memcg (and the core
MM as well) is that it doesn't really fit the whole scheme. In several
aspects. First and the foremost it is an independently managed resource
with its own pool management, use and lifetime.

There is no notion of memory reclaim and this makes a huge difference
for the pool that might consume considerable amount of memory. While
this is the case for many kernel allocations as well they usually do not
consume considerable portions of the accounted memory. This makes it
really tricky to handle limit enforcement gracefully.

Another important aspect comes from the lifetime semantics when a proper
reservations accounting and managing needs to handle mmap time rather
than than usual allocation path. While pages are allocated they do not
belong to anybody and only later at the #PF time (or read for the fs
backed mapping) the ownership is established. That makes it really hard
to manage memory as whole under the memcg anyway as a large part of
that pool sits without an ownership yet it cannot be used for any other
purpose.

These and more reasons where behind the earlier decision o have a
dedicated hugetlb controller.

Also I will also Nack involving hugetlb pages being accounted by
default. This would break any setups which mix normal and hugetlb memory
with memcg limits applied.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ