[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231003125444.GB17012@cmpxchg.org>
Date: Tue, 3 Oct 2023 08:54:44 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Nhat Pham <nphamcs@...il.com>
Cc: akpm@...ux-foundation.org, riel@...riel.com, mhocko@...nel.org,
roman.gushchin@...ux.dev, shakeelb@...gle.com,
muchun.song@...ux.dev, tj@...nel.org, lizefan.x@...edance.com,
shuah@...nel.org, mike.kravetz@...cle.com, yosryahmed@...gle.com,
fvdl@...gle.com, linux-mm@...ck.org, kernel-team@...a.com,
linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in
memory controller
On Mon, Oct 02, 2023 at 05:18:27PM -0700, Nhat Pham wrote:
> Currently, hugetlb memory usage is not acounted for in the memory
> controller, which could lead to memory overprotection for cgroups with
> hugetlb-backed memory. This has been observed in our production system.
>
> For instance, here is one of our usecases: suppose there are two 32G
> containers. The machine is booted with hugetlb_cma=6G, and each
> container may or may not use up to 3 gigantic page, depending on the
> workload within it. The rest is anon, cache, slab, etc. We can set the
> hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness.
> But it is very difficult to configure memory.max to keep overall
> consumption, including anon, cache, slab etc. fair.
>
> What we have had to resort to is to constantly poll hugetlb usage and
> readjust memory.max. Similar procedure is done to other memory limits
> (memory.low for e.g). However, this is rather cumbersome and buggy.
> Furthermore, when there is a delay in memory limits correction, (for e.g
> when hugetlb usage changes within consecutive runs of the userspace
> agent), the system could be in an over/underprotected state.
>
> This patch rectifies this issue by charging the memcg when the hugetlb
> folio is utilized, and uncharging when the folio is freed (analogous to
> the hugetlb controller). Note that we do not charge when the folio is
> allocated to the hugetlb pool, because at this point it is not owned by
> any memcg.
>
> Some caveats to consider:
> * This feature is only available on cgroup v2.
> * There is no hugetlb pool management involved in the memory
> controller. As stated above, hugetlb folios are only charged towards
> the memory controller when it is used. Host overcommit management
> has to consider it when configuring hard limits.
> * Failure to charge towards the memcg results in SIGBUS. This could
> happen even if the hugetlb pool still has pages (but the cgroup
> limit is hit and reclaim attempt fails).
> * When this feature is enabled, hugetlb pages contribute to memory
> reclaim protection. low, min limits tuning must take into account
> hugetlb memory.
> * Hugetlb pages utilized while this option is not selected will not
> be tracked by the memory controller (even if cgroup v2 is remounted
> later on).
>
> Signed-off-by: Nhat Pham <nphamcs@...il.com>
Acked-by: Johannes Weiner <hannes@...xchg.org>
Powered by blists - more mailing lists