[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241031183413.bb0bc34e8354cc14cdfc3c29@linux-foundation.org>
Date: Thu, 31 Oct 2024 18:34:13 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Joshua Hahn <joshua.hahnjy@...il.com>
Cc: Michal Hocko <mhocko@...e.com>, Johannes Weiner <hannes@...xchg.org>,
nphamcs@...il.com, shakeel.butt@...ux.dev, roman.gushchin@...ux.dev,
muchun.song@...ux.dev, tj@...nel.org, lizefan.x@...edance.com,
mkoutny@...e.com, corbet@....net, lnyng@...a.com, cgroups@...r.kernel.org,
linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH v3 1/1] memcg/hugetlb: Adding hugeTLB counters to memcg
On Thu, 31 Oct 2024 15:03:34 -0400 Joshua Hahn <joshua.hahnjy@...il.com> wrote:
> Andrew -- I am sorry to ask again, but do you think you can replace
> the 3rd section in the patch (3. Implementation Details) with the
> following paragraphs?
No problem.
: This patch introduces a new counter to memory.stat that tracks hugeTLB
: usage, only if hugeTLB accounting is done to memory.current. This feature
: is enabled the same way hugeTLB accounting is enabled, via the
: memory_hugetlb_accounting mount flag for cgroupsv2.
:
: 1. Why is this patch necessary?
: Currently, memcg hugeTLB accounting is an opt-in feature [1] that adds
: hugeTLB usage to memory.current. However, the metric is not reported in
: memory.stat. Given that users often interpret memory.stat as a breakdown
: of the value reported in memory.current, the disparity between the two
: reports can be confusing. This patch solves this problem by including the
: metric in memory.stat as well, but only if it is also reported in
: memory.current (it would also be confusing if the value was reported in
: memory.stat, but not in memory.current)
:
: Aside from the consistency between the two files, we also see benefits in
: observability. Userspace might be interested in the hugeTLB footprint of
: cgroups for many reasons. For instance, system admins might want to
: verify that hugeTLB usage is distributed as expected across tasks: i.e.
: memory-intensive tasks are using more hugeTLB pages than tasks that don't
: consume a lot of memory, or are seen to fault frequently. Note that this
: is separate from wanting to inspect the distribution for limiting purposes
: (in which case, hugeTLB controller makes more sense).
:
: 2. We already have a hugeTLB controller. Why not use that? It is true
: that hugeTLB tracks the exact value that we want. In fact, by enabling
: the hugeTLB controller, we get all of the observability benefits that I
: mentioned above, and users can check the total hugeTLB usage, verify if it
: is distributed as expected, etc.
:
: 3. Implementation Details:
: In the alloc / free hugetlb functions, we call lruvec_stat_mod_folio
: regardless of whether memcg accounts hugetlb. mem_cgroup_commit_charge
: which is called from alloc_hugetlb_folio will set memcg for the folio
: only if the CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING cgroup mount option is
: used, so lruvec_stat_mod_folio accounts per-memcg hugetlb counters only
: if the feature is enabled. Regardless of whether memcg accounts for
: hugetlb, the newly added global counter is updated and shown in
: /proc/vmstat.
:
: The global counter is added because vmstats is the preferred framework
: for cgroup stats. It makes stat items consistent between global and
: cgroups. It also provides a per-node breakdown, which is useful.
: Because it does not use cgroup-specific hooks, we also keep generic MM
: code separate from memcg code.
:
: With this said, there are 2 problems:
: (a) They are still not reported in memory.stat, which means the
: disparity between the memcg reports are still there.
: (b) We cannot reasonably expect users to enable the hugeTLB controller
: just for the sake of hugeTLB usage reporting, especially since
: they don't have any use for hugeTLB usage enforcing [2].
:
: [1] https://lore.kernel.org/all/20231006184629.155543-1-nphamcs@gmail.com/
: [2] Of course, we can't make a new patch for every feature that can be
: duplicated. However, since the existing solution of enabling the
: hugeTLB controller is an imperfect solution that still leaves a
: discrepancy between memory.stat and memory.curent, I think that it
: is reasonable to isolate the feature in this case.
Powered by blists - more mailing lists