[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200227133544.GA20690@blackbody.suse.cz>
Date: Thu, 27 Feb 2020 14:35:44 +0100
From: Michal Koutný <mkoutny@...e.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...e.com>,
Tejun Heo <tj@...nel.org>, linux-mm@...ck.org,
cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
kernel-team@...com
Subject: Re: [PATCH v2 3/3] mm: memcontrol: recursive memory.low protection
TL;DR I see merit in the recursive propagation if it's requested
explicitly (i.e. retaining meaining of 0). The protection/weight
semantics should be refined.
On Wed, Feb 26, 2020 at 10:05:48AM -0500, Johannes Weiner <hannes@...xchg.org> wrote:
> They still ultimately translate to real resources. The concrete value
> depends on what the parent's weight translates to, and it depends on
> sibling configurations and their current consumption. (All of this is
> already true for memory protection as well, btw). But eventually, a
> weight specification translates to actual time on a CPU, bandwidth on
> an IO device etc.
>
> > - sum of sibling weights is meaningless (and independent from parent
> > weight)
>
> Technically true for overcommitted memory.low values as well.
Yes, but for overcommited only. For pure weights it doesn't matter if
you set 1:10, 10:100 or 100:1000, however, for the protection it has
this behavior only when approaching infinity and as the sum compares to
parent's value, the protection behaves differently.
[If there had to be to some pure memory weights, those would for
instance express relative affinity of group's pages to physical memory.]
> I don't see a fundamental difference between them. And that in turn
> makes it hard for me to accept that hierarchical inheritance rules
> should be different.
I'll try coming up with some better examples for the difference that I
perceive.
> "Wrong" isn't the right term. Is it what you wanted to express in your
> configuration?
I want to express absolute amount of memory (ideally representing
workingset size) under protection.
IIUC, you want to express general relative priorities of B vs C when
some outer metric has to be maintained given you reach both limits of
memory and IO.
> You are talking about a mathematical truth on a per-controller
> basis. What I'm saying is that I don't see how this is useful for real
> workloads, their relative priorities, and the performance expectations
> users have from these priorities.
> With a priority inversion like this, there is no actual performance
> isolation or containerization going on here - which is the whole point
> of cgroups and resource control.
I acknowledge that by pressing too much along one dimension (memory) you
induce expansion in other dimension (IO) and that may become noticable in
siblings (expansion over saturation [1]). But that's expected when only
weights are in use. If you wanted to hide the effect of workload B to C,
B would need real limit.
[I beg to disagree that containerization is whole point of cgroups, it's
large part of it, hence the isolation needn't be necessarily
bi-directional.]
> My objection is to opting out of protection against cousins (thus
> overriding parental resource assignment), not against siblings.
Just to sync up the terminology - I'm calling this protection against
uncles (the composition/structure under them is irrelevant).
And the limitation comes from grandparent or higher (or global).
...and the overriden parental resource assignment is the expansion on
non-memory dimension (IO/CPU).
> Correct, but you can change the tree to this:
>
> A.low=10G
> `- A1.low=10G
> `- B.low=0G
> `- C.low=0G
> `- D.low=0G
>
> to express
>
> A1 > D
> B = C
That sort of works (if I give up the scapegoat). Although I have trouble
that I have to copy the value from A to A1, I could have done that with
previous hierarchy and simply set B.low=C.low=10G.
> That is, I would like to see an argument for this setup:
>
> A
> `- B io.weight=200 memory.low=10G
> `- D io.weight=100 (e.g.) memory.low=10G
> `- E io.weight=100 (e.g.) memory.low=0
> `- C io.weight=50 memory.low=5G
>
> Where E has no memory protection against C, but E has IO priority over
> C. That's the configuration that cannot be expressed with a recursive
> memory.low, but since it involves priority inversions it's not useful
> to actually isolate and containerize workloads.
But there can be no cousin (uncle) or more precisely it's the global
rest that we don't mind to affect.
> > I'd say that protected memory is a disposable resource in contrast with
> > CPU/IO. If you don't have latter, you don't progress; if you lack the
> > former, you are refaulting but can make progress. Even more, you should
> > be able to give up memory.min.
>
> Eh, I'm not buying that. You cannot run without memory either. If
> somebody reclaims a page between you faulting it in and you resuming
> to userspace, there is no forward progress.
I made a hasty argument (misinterpretting the constant outer reclaim
pressure). So that wasn't the fundamental difference.
The second part -- memory.min is subject to equal calculation as
memory.low. Do you find the scape goat preventing OOM in grand-parent
(or higher) subtree also a misfeature/artifact?
Thanks,
Michal
[1] Here I'm taking your/Tejun's assumption that in the stressful
situations it always boils down to IO, although I don't have any
quantitative arguments for that.
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists