[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkbqtrJqD9=5f-ZZBcWyX9t-e=fenJdDU5U=GDpbbWrzrw@mail.gmail.com>
Date: Thu, 28 Dec 2023 07:13:23 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Shakeel Butt <shakeelb@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Johannes Weiner <hannes@...xchg.org>,
Yu Zhao <yuzhao@...gle.com>, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: ratelimit stat flush from workingset shrinker
On Wed, Dec 27, 2023 at 11:31 PM Shakeel Butt <shakeelb@...gle.com> wrote:
>
> One of our internal workload regressed on newer upstream kernel and on
> further investigation, it seems like the cause is the always synchronous
> rstat flush in the count_shadow_nodes() added by the commit f82e6bf9bb9b
> ("mm: memcg: use rstat for non-hierarchical stats"). On further
> inspection it seems like we don't really need accurate stats in this
> function as it was already approximating the amount of appropriate
> shadow entried to keep for maintaining the refault information. Since
s/entried/entries
> there is already 2 sec periodic rstat flush, we don't need exact stats
> here. Let's ratelimit the rstat flush in this code path.
Is the regression observed even with commit 7d7ef0a4686a ("mm: memcg:
restore subtree stats flushing")? I think the answer is yes based on
internal discussions, but this really surprises me.
Commit f82e6bf9bb9b removed the percpu loop in
lruvec_page_state_local(), and added a flush call. With 7d7ef0a4686a,
the flush call is only effective if there are pending updates in the
cgroup subtree exceeding MEMCG_CHARGE_BATCH * num_online_cpus(). IOW,
we should only be doing work when actually needed, whereas before we
used to have multiple percpu loops in count_shadow_nodes() regardless
of pending updates.
It seems like the cgroup subtree is very active that we continuously
need to flush in count_shadow_nodes()? If that's the case, do we still
think it's okay not to flush when we know there are pending updates? I
don't have enough background about the workingset heuristics to judge
this.
I am not objecting to this change, I am just trying to understand
what's happening.
Thanks!
>
> Fixes: f82e6bf9bb9b ("mm: memcg: use rstat for non-hierarchical stats")
> Signed-off-by: Shakeel Butt <shakeelb@...gle.com>
> ---
> mm/workingset.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/workingset.c b/mm/workingset.c
> index 2a2a34234df9..226012974328 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -680,7 +680,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
> struct lruvec *lruvec;
> int i;
>
> - mem_cgroup_flush_stats(sc->memcg);
> + mem_cgroup_flush_stats_ratelimited(sc->memcg);
> lruvec = mem_cgroup_lruvec(sc->memcg, NODE_DATA(sc->nid));
> for (pages = 0, i = 0; i < NR_LRU_LISTS; i++)
> pages += lruvec_page_state_local(lruvec,
> --
> 2.43.0.472.g3155946c3a-goog
>
Powered by blists - more mailing lists