lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=NSaRAjiKjGtYxPwh9ByBZ_DK+h3T6LS5-eNpxS4s4zPA@mail.gmail.com>
Date: Tue, 11 Jun 2024 08:51:02 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Takero Funaki <flintglass@...il.com>
Cc: Yosry Ahmed <yosryahmed@...gle.com>, Johannes Weiner <hannes@...xchg.org>, 
	Chengming Zhou <chengming.zhou@...ux.dev>, Jonathan Corbet <corbet@....net>, 
	Andrew Morton <akpm@...ux-foundation.org>, 
	Domenico Cerasuolo <cerasuolodomenico@...il.com>, linux-mm@...ck.org, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 2/3] mm: zswap: fix global shrinker error handling logic

On Tue, Jun 11, 2024 at 8:21 AM Takero Funaki <flintglass@...il.com> wrote:
>
>
> Since shrink_worker evicts only one page per tree walk when there is
> only one memcg using zswap, I believe this is the intended behavior.

I don't think this is the intended behavior :) It's a holdover from
the old zswap reclaiming behaviors.

1. In the past, we used to shrink one object per shrink worker call.
This is crazy.

2. We then move the LRU from the allocator level to zswap level, and
shrink one object at a time until the pool can accept new pages (i.e
under the acceptance threshold).

3. When we separate the LRU to per-(memcg, node), we keep the
shrink-one-at-a-time part, but do it round-robin style on each of the
(memcg, node) combination.

It's time to optimize this. 4th time's the charm!

> Even if we choose to break the loop more aggressively, it would only
> be postponing the problem because pool_limit_hit will trigger the
> worker again.
>
> I agree the existing approach is inefficient. It might be better to
> change the 1 page in a round-robin strategy.

We can play with a bigger batch.

1. Most straightforward idea is to just use a bigger constant (32? 64? 128?)

2. We can try to shrink until accept for each memcg, hoping that the
round robin selection maintains fairness in the long run - but this
can be a bad idea in the short run for the memcg selected. At the very
least, this should try to respect the protected area for each lruvec.
This might still come into conflict with the zswap shrinker though
(since the protection is best-effort).

3. Proportional reclaim - a variant of what we're doing in
get_scan_count() for page reclaim?

scan = lruvec_size - lruvec_size * protection / (cgroup_size + 1);

protection is derived from memory.min or memory.low of the cgroup, and
cgroup_size is the memory usage of the cgroup. lruvec_size maybe we
can substitute with the number of (reclaimable/unprotected?) zswap
objects in the (node, memcg) lru?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ