[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220322182248.29121-1-mkoutny@suse.com>
Date: Tue, 22 Mar 2022 19:22:48 +0100
From: Michal Koutný <mkoutny@...e.com>
To: cgroups@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Cc: Richard Palethorpe <rpalethorpe@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Shakeel Butt <shakeelb@...gle.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Michal Hocko <mhocko@...e.com>,
Vlastimil Babka <vbabka@...e.cz>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Muchun Song <songmuchun@...edance.com>,
Johannes Weiner <hannes@...xchg.org>,
Yang Shi <shy828301@...il.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Tejun Heo <tj@...nel.org>, Chris Down <chris@...isdown.name>
Subject: [RFC PATCH] mm: memcg: Do not count memory.low reclaim if it does not happen
This was observed with memcontrol selftest/new LTP test but can be also
reproduced in simplified setup of two siblings:
`parent .low=50M
` s1 .low=50M .current=50M+ε
` s2 .low=0M .current=50M
The expectation is that s2/memory.events:low will be zero under outer
reclaimer since no protection should be given to cgroup s2 (even with
memory_recursiveprot).
However, this does not happen. The apparent reason is that when s1 is
considered for (proportional) reclaim the scanned proportion is rounded
up to SWAP_CLUSTER_MAX and slightly over-proportional amount is
reclaimed. Consequently, when the effective low value of s2 is
calculated, it observes unclaimed parent's protection from s1
(ε-SWAP_CLUSTER_MAX in theory) and effectively appropriates it.
The effect is slightly regularized protection (workload dependent)
between siblings and misreported MEMCG_LOW event when reclaiming s2 with
this protection.
Fix the behavior by not reporting breached memory.low in such
situations. (This affects also setups where all siblings have
memory.low=0, parent's memory.events:low will still be non-zero when
parent's memory.low is breached but it will be reduced by the events
originated in children.)
Fixes: 8a931f801340 ("mm: memcontrol: recursive memory.low protection")
Reported-by: Richard Palethorpe <rpalethorpe@...e.com>
Link: https://lore.kernel.org/all/20220321101429.3703-1-rpalethorpe@suse.com/
Signed-off-by: Michal Koutný <mkoutny@...e.com>
---
include/linux/memcontrol.h | 8 ++++----
mm/vmscan.c | 5 +++--
2 files changed, 7 insertions(+), 6 deletions(-)
Why is this RFC?
1) It changes number of events observed on parent/memory.events:low (especially
for truly recursive configs where all children specify memory.low=0).
IIUC past discussions about equality of all-zeros and all-infinities, those
eagerly reported MEMCG_LOW events (in latter case) were deemed skewing the
stats [1].
2) The observed behavior slightly impacts distribution of parent's memory.low.
Constructed example is a passive protected workload in s1 and active in s2
(active ~ counteracts the reclaim with allocations). It could strip
protection from s1 one by one (one:=SWAP_CLUSTER_MAX/2^sc.priority).
That may be considered both wrong (s1 should have been more protected) or
correct s2 deserves protection due to its activity.
I don't have (didn't collect) data for this, so I think just masking the
false events is sufficient (or independent).
[1] https://lore.kernel.org/r/20200221185839.GB70967@cmpxchg.org
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0abbd685703b..99ac72e00bff 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -626,13 +626,13 @@ static inline bool mem_cgroup_supports_protection(struct mem_cgroup *memcg)
}
-static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg)
+static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg, bool effective)
{
if (!mem_cgroup_supports_protection(memcg))
return false;
- return READ_ONCE(memcg->memory.elow) >=
- page_counter_read(&memcg->memory);
+ return page_counter_read(&memcg->memory) <= (effective ?
+ READ_ONCE(memcg->memory.elow) : READ_ONCE(memcg->memory.low));
}
static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
@@ -1177,7 +1177,7 @@ static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
{
}
-static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg)
+static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg, bool effective)
{
return false;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 59b14e0d696c..3bdb35d6bee6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3152,7 +3152,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
* If there is no reclaimable memory, OOM.
*/
continue;
- } else if (mem_cgroup_below_low(memcg)) {
+ } else if (mem_cgroup_below_low(memcg, true)) {
/*
* Soft protection.
* Respect the protection only as long as
@@ -3163,7 +3163,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
sc->memcg_low_skipped = 1;
continue;
}
- memcg_memory_event(memcg, MEMCG_LOW);
+ if (mem_cgroup_below_low(memcg, false))
+ memcg_memory_event(memcg, MEMCG_LOW);
}
reclaimed = sc->nr_reclaimed;
--
2.35.1
Powered by blists - more mailing lists