linux-kernel - Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF8kJuP4o3Ebg-aYKF9=ftbdu97RnkXnYL-8RuVjYAeXd+ng3A@mail.gmail.com>
Date: Wed, 17 Jan 2024 15:41:35 -0800
From: Chris Li <chrisl@...nel.org>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Chengming Zhou <zhouchengming@...edance.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	linux-kernel@...r.kernel.org, Johannes Weiner <hannes@...xchg.org>, linux-mm@...ck.org, 
	Nhat Pham <nphamcs@...il.com>
Subject: Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree

Hi Yosry and Chengming,


On Wed, Jan 17, 2024 at 10:38 AM Yosry Ahmed <yosryahmed@...gle.com> wrote:
>
> On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou
> <zhouchengming@...edance.com> wrote:
> >
> > When testing the zswap performance by using kernel build -j32 in a tmpfs
> > directory, I found the scalability of zswap rb-tree is not good, which
> > is protected by the only spinlock. That would cause heavy lock contention
> > if multiple tasks zswap_store/load concurrently.
> >
> > So a simple solution is to split the only one zswap rb-tree into multiple
> > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is
> > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks").
> >
> > Although this method can't solve the spinlock contention completely, it
> > can mitigate much of that contention. Below is the results of kernel build
> > in tmpfs with zswap shrinker enabled:
> >
> >      linux-next  zswap-lock-optimize
> > real 1m9.181s    1m3.820s
> > user 17m44.036s  17m40.100s
> > sys  7m37.297s   4m54.622s
> >
> > So there are clearly improvements. And it's complementary with the ongoing
> > zswap xarray conversion by Chris. Anyway, I think we can also merge this
> > first, it's complementary IMHO. So I just refresh and resend this for
> > further discussion.
>

Sorry I have been radio silent busying on a few refreshments of the
xarray on the recent kernel tree. There is an assertion triggered on
xarray and the rb tree does not agree with each other. It takes some
time to debug. I ironed that out, also glad the assert did catch a
bug.

Currently the xarray patch should have everything it takes to use RCU
read lock. However taking out the tree spinlock is more work than
previously. If we are going to remove the tree spinlock, I think we
should revert back to doing a zswap tree lookup and return the zswap
entry with reference increased. The tree mapping can still decouple
from the zswap entry reference count drop to zero.  Anyway, my V1 of
the xarray patch will not include removing the tree spinlock.

> The reason why I think we should wait for the xarray patch(es) is
> there is a chance we may see less improvements from splitting the tree
> if it was an xarray. If we merge this series first, there is no way to
> know.
>
> Chris, do you intend to send the xarray patch(es) anytime soon?

Thanks for the heads up. Let me send it out now.

Chris