[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABdmKX2K4MMe9rsKfWi9RxUS5G1RkLVzuUkPnovt5O2hqVmbWA@mail.gmail.com>
Date: Tue, 23 Jan 2024 05:58:05 -0800
From: "T.J. Mercier" <tjmercier@...gle.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeelb@...gle.com>, Muchun Song <muchun.song@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>, android-mm@...gle.com, yuzhao@...gle.com,
yangyifei03@...ishou.com, cgroups@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Revert "mm:vmscan: fix inaccurate reclaim during
proactive reclaim"
On Tue, Jan 23, 2024 at 1:33 AM Michal Hocko <mhocko@...e.com> wrote:
>
> On Sun 21-01-24 21:44:12, T.J. Mercier wrote:
> > This reverts commit 0388536ac29104a478c79b3869541524caec28eb.
> >
> > Proactive reclaim on the root cgroup is 10x slower after this patch when
> > MGLRU is enabled, and completion times for proactive reclaim on much
> > smaller non-root cgroups take ~30% longer (with or without MGLRU).
>
> What is the reclaim target in these pro-active reclaim requests?
Two targets:
1) /sys/fs/cgroup/memory.reclaim
2) /sys/fs/cgroup/uid_0/memory.reclaim (a bunch of Android system services)
Note that lru_gen_shrink_node is used for 1, but shrink_node_memcgs is
used for 2.
The 10x comes from the rate of reclaim (~70k pages/sec vs ~6.6k
pages/sec) for 1. After this revert the root reclaim took only about
10 seconds. Before the revert it's still running after about 3 minutes
using a core at 100% the whole time, and I'm too impatient to wait
longer to record times for comparison.
The 30% comes from the average of a few runs for 2:
Before revert:
$ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time
echo "" > /sys/fs/cgroup/uid_0/memory.reclaim'
restarting adbd as root
0m09.69s real 0m00.00s user 0m09.19s system
After revert:
$ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time
echo "" > /sys/fs/cgroup/uid_0/memory.reclaim'
0m07.31s real 0m00.00s user 0m06.44s system
It's actually a bigger difference for smaller reclaim amounts:
Before revert:
$ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time
echo "3G" > /sys/fs/cgroup/uid_0/memory.reclaim'
0m12.04s real 0m00.00s user 0m11.48s system
After revert:
$ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time
echo "3G" > /sys/fs/cgroup/uid_0/memory.reclaim'
0m06.65s real 0m00.00s user 0m05.91s system
> > With
> > root reclaim before the patch, I observe average reclaim rates of
> > ~70k pages/sec before try_to_free_mem_cgroup_pages starts to fail and
> > the nr_retries counter starts to decrement, eventually ending the
> > proactive reclaim attempt.
>
> Do I understand correctly that the reclaim target is over estimated and
> you expect that the reclaim process breaks out early
Yes. I expect memory_reclaim to fail at some point when it becomes
difficult/impossible to reclaim pages where I specify a large amount
to reclaim. The ask here is, "please reclaim as much as possible from
this cgroup, but don't take all day". But it takes minutes to get
there on the root cgroup, working SWAP_CLUSTER_MAX pages at a time.
> > After the patch the reclaim rate is
> > consistently ~6.6k pages/sec due to the reduced nr_pages value causing
> > scan aborts as soon as SWAP_CLUSTER_MAX pages are reclaimed. The
> > proactive reclaim doesn't complete after several minutes because
> > try_to_free_mem_cgroup_pages is still capable of reclaiming pages in
> > tiny SWAP_CLUSTER_MAX page chunks and nr_retries is never decremented.
>
> I do not understand this part. How does a smaller reclaim target manages
> to have reclaimed > 0 while larger one doesn't?
They both are able to make progress. The main difference is that a
single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon
after it reclaims nr_to_reclaim, and before it touches all memcgs. So
a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish
pages with MGLRU. WIthout MGLRU the memcg walk is not aborted
immediately after nr_to_reclaim is reached, so a single call to
try_to_free_mem_cgroup_pages can actually reclaim thousands of pages
even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.)
https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@google.com/
Powered by blists - more mailing lists