[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240807110341.GA1726375@cmpxchg.org>
Date: Wed, 7 Aug 2024 07:03:41 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Nhat Pham <nphamcs@...il.com>
Cc: akpm@...ux-foundation.org, yosryahmed@...gle.com,
shakeel.butt@...ux.dev, linux-mm@...ck.org, kernel-team@...a.com,
linux-kernel@...r.kernel.org, flintglass@...il.com,
chengming.zhou@...ux.dev
Subject: Re: [PATCH v3 1/2] zswap: implement a second chance algorithm for
dynamic zswap shrinker
On Mon, Aug 05, 2024 at 04:22:42PM -0700, Nhat Pham wrote:
> Current zswap shrinker's heuristics to prevent overshrinking is brittle
> and inaccurate, specifically in the way we decay the protection size
> (i.e making pages in the zswap LRU eligible for reclaim).
>
> We currently decay protection aggressively in zswap_lru_add() calls.
> This leads to the following unfortunate effect: when a new batch of
> pages enter zswap, the protection size rapidly decays to below 25% of
> the zswap LRU size, which is way too low.
>
> We have observed this effect in production, when experimenting with the
> zswap shrinker: the rate of shrinking shoots up massively right after a
> new batch of zswap stores. This is somewhat the opposite of what we want
> originally - when new pages enter zswap, we want to protect both these
> new pages AND the pages that are already protected in the zswap LRU.
>
> Replace existing heuristics with a second chance algorithm
>
> 1. When a new zswap entry is stored in the zswap pool, its referenced
> bit is set.
> 2. When the zswap shrinker encounters a zswap entry with the referenced
> bit set, give it a second chance - only flips the referenced bit and
> rotate it in the LRU.
> 3. If the shrinker encounters the entry again, this time with its
> referenced bit unset, then it can reclaim the entry.
>
> In this manner, the aging of the pages in the zswap LRUs are decoupled
> from zswap stores, and picks up the pace with increasing memory pressure
> (which is what we want).
>
> The second chance scheme allows us to modulate the writeback rate based
> on recent pool activities. Entries that recently entered the pool will
> be protected, so if the pool is dominated by such entries the writeback
> rate will reduce proportionally, protecting the workload's workingset.On
> the other hand, stale entries will be written back quickly, which
> increases the effective writeback rate.
>
> The referenced bit is added at the hole after the `length` field of
> struct zswap_entry, so there is no extra space overhead for this
> algorithm.
>
> We will still maintain the count of swapins, which is consumed and
> subtracted from the lru size in zswap_shrinker_count(), to further
> penalize past overshrinking that led to disk swapins. The idea is that
> had we considered this many more pages in the LRU active/protected, they
> would not have been written back and we would not have had to swapped
> them in.
>
> To test this new heuristics, I built the kernel under a cgroup with
> memory.max set to 2G, on a host with 36 cores:
>
> With the old shrinker:
>
> real: 263.89s
> user: 4318.11s
> sys: 673.29s
> swapins: 227300.5
>
> With the second chance algorithm:
>
> real: 244.85s
> user: 4327.22s
> sys: 664.39s
> swapins: 94663
>
> (average over 5 runs)
>
> We observe an 1.3% reduction in kernel CPU usage, and around 7.2%
> reduction in real time. Note that the number of swapped in pages
> dropped by 58%.
>
> Suggested-by: Johannes Weiner <hannes@...xchg.org>
> Signed-off-by: Nhat Pham <nphamcs@...il.com>
Acked-by: Johannes Weiner <hannes@...xchg.org>
Powered by blists - more mailing lists