lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPTztWY8eDSa1qKx35hTm5ef+e13SDnRHDrevc-1V1v7-pEP3w@mail.gmail.com>
Date:   Tue, 26 Sep 2023 13:50:10 -0700
From:   Frank van der Linden <fvdl@...gle.com>
To:     Nhat Pham <nphamcs@...il.com>
Cc:     akpm@...ux-foundation.org, riel@...riel.com, hannes@...xchg.org,
        mhocko@...nel.org, roman.gushchin@...ux.dev, shakeelb@...gle.com,
        muchun.song@...ux.dev, tj@...nel.org, lizefan.x@...edance.com,
        shuah@...nel.org, mike.kravetz@...cle.com, yosryahmed@...gle.com,
        linux-mm@...ck.org, kernel-team@...a.com,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH 0/2] hugetlb memcg accounting

On Tue, Sep 26, 2023 at 12:49 PM Nhat Pham <nphamcs@...il.com> wrote:
>
> Currently, hugetlb memory usage is not acounted for in the memory
> controller, which could lead to memory overprotection for cgroups with
> hugetlb-backed memory. This has been observed in our production system.
>
> This patch series rectifies this issue by charging the memcg when the
> hugetlb folio is allocated, and uncharging when the folio is freed. In
> addition, a new selftest is added to demonstrate and verify this new
> behavior.
>
> Nhat Pham (2):
>   hugetlb: memcg: account hugetlb-backed memory in memory controller
>   selftests: add a selftest to verify hugetlb usage in memcg
>
>  MAINTAINERS                                   |   2 +
>  fs/hugetlbfs/inode.c                          |   2 +-
>  include/linux/hugetlb.h                       |   6 +-
>  include/linux/memcontrol.h                    |   8 +
>  mm/hugetlb.c                                  |  23 +-
>  mm/memcontrol.c                               |  40 ++++
>  tools/testing/selftests/cgroup/.gitignore     |   1 +
>  tools/testing/selftests/cgroup/Makefile       |   2 +
>  .../selftests/cgroup/test_hugetlb_memcg.c     | 222 ++++++++++++++++++
>  9 files changed, 297 insertions(+), 9 deletions(-)
>  create mode 100644 tools/testing/selftests/cgroup/test_hugetlb_memcg.c
>
> --
> 2.34.1
>

We've had this behavior at Google for a long time, and we're actually
getting rid of it. hugetlb pages are a precious resource that should
be accounted for separately. They are not just any memory, they are
physically contiguous memory, charging them the same as any other
region of the same size ended up not making sense, especially not for
larger hugetlb page sizes.

Additionally, if this behavior is changed just like that, there will
be quite a few workloads that will break badly because they'll hit
their limits immediately - imagine a container that uses 1G hugetlb
pages to back something large (a database, a VM), and 'plain' memory
for control processes.

What do your workloads do? Is it not possible for you to account for
hugetlb pages separately? Sure, it can be annoying to have to deal
with 2 separate totals that you need to take into account, but again,
hugetlb pages are a resource that is best dealt with separately.

- Frank

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ