[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZyNAxnOqOfYvqxjc@tiehlicka>
Date: Thu, 31 Oct 2024 09:33:10 +0100
From: Michal Hocko <mhocko@...e.com>
To: Stepanov Anatoly <stepanov.anatoly@...wei.com>
Cc: Gutierrez Asier <gutierrez.asier@...wei-partners.com>,
akpm@...ux-foundation.org, david@...hat.com, ryan.roberts@....com,
baohua@...nel.org, willy@...radead.org, peterx@...hat.com,
hannes@...xchg.org, hocko@...nel.org, roman.gushchin@...ux.dev,
shakeel.butt@...ux.dev, muchun.song@...ux.dev,
cgroups@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
alexander.kozhevnikov@...wei-partners.com, guohanjun@...wei.com,
weiyongjun1@...wei.com, wangkefeng.wang@...wei.com,
judy.chenhui@...wei.com, yusongping@...wei.com,
artem.kuzin@...wei.com, kang.sun@...wei.com
Subject: Re: [RFC PATCH 0/3] Cgroup-based THP control
On Thu 31-10-24 09:06:47, Stepanov Anatoly wrote:
[...]
> As prctl(PR_SET_THP_DISABLE) can only be used from the calling thread,
> it needs app. developer participation anyway.
> In theory, kind of a launcher-process can be used, to utilize the inheritance
> of the corresponding prctl THP setting, but this seems not transparent
> for the user-space.
No, this is not in theaory. This is a very common usage pattern to allow
changing the behavior for the target application transparently.
> And what if we'd like to enable THP for a specific set of unrelated (in terms of parent-child)
> tasks?
This is what I've had in mind. Currently we only have THP disable
option. If we really need an override to enforce THP on an application
then this could be a more viable path.
> IMHO, an alternative approach would be changing per-process THP-mode by PID,
> thus also avoiding any user app. changes.
We already have process_madvise. MADV_HUGEPAGE resp. MADV_COLLAPSE are
not supported but we can discuss that option of course. This interface
requires much more orchestration of course because it is VMA range
based.
> > You have not really answered a more fundamental question though. Why the
> > THP behavior should be at the cgroup scope? From a practical POV that
> > would represent containers which are a mixed bag of applications to
> > support the workload. Why does the same THP policy apply to all of them?
>
> For THP there're 3 possible levels of fine-control:
> - global THP
> - THP per-group of processes
> - THP per-process
>
> I agree, that in a container, different apps might have different
> THP requirements.
> But it also depends on many factors, such as:
> container "size"(tiny/huge container), diversity of apps/functions inside a container.
> I mean, for some cases, we might not need to go below "per-group" level in terms of THP control.
I am sorry but I do not really see any argument why this should be
per-memcg. Quite contrary. having that per memcg seems more muddy.
> > Doesn't this make the sub-optimal global behavior the same on the cgroup
> > level when some parts will benefit while others will not?
> >
>
> I think the key idea for the sub-optimal behavior is "predictability",
> so we know for sure which apps/services would consume THPs.
OK, that seems fair.
> We observed a significant THP usage on almost idle Ubuntu server, with simple test running,
> (some random system services consumed few hundreds Mb of THPs).
I assume that you are using Always as global default configuration,
right? If that is the case then the high (in fact as high as feasible)
THP utilization is a real goal. If you want more targeted THP use then
madvise is what you are looking for. This will not help applications
which are not THP aware of course but then we are back to the discussion
whether the interface should be per a) per process b) per cgroup c)
process_madvise.
> Of course, on other distros me might have different situation.
> But with fine-grained per-group control it's a lot more predictable.
>
> Am i got you question right?
Not really but at least I do understand (hopefully) that you are trying
to workaround THP overuse by changing the global default to be more
restrictive while some workloads to be less restrictive. The question
why pushing that down to memcg scope makes the situation better is not
answered AFAICT.
[...]
> > So if the parent decides that none of the children should be using THP
> > they can override that so the tuning at parent has no imperative
> > control. This is breaking hierarchical property that is expected from
> > cgroup control files.
>
> Actually, i think we can solve this.
> As we mostly need just a single children level,
> "flat" case (root->child) is enough, interpreting root-memcg THP mode as "global THP setting",
> where sub-children are forbidden to override an inherited THP-mode.
This reduced case is not really sufficient to justify the non
hiearchical semantic, I am afraid. There must be a _really_ strong case
to break this property and even then I am rather skeptical to be honest.
We have been burnt by introducing stuff like memcg.swappiness that
seemed like a good idea initially but backfired with unexpected behavior
to many users.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists