lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF8kJuM4ybP+4_3zssCfV3-Vf9_gE2P7jiOcD9OGgT4JjFC0bg@mail.gmail.com>
Date: Fri, 19 Jan 2024 03:59:07 -0800
From: Chris Li <chrisl@...nel.org>
To: Chengming Zhou <zhouchengming@...edance.com>
Cc: Yosry Ahmed <yosryahmed@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	Wei Xu <weixugc@...gle.com>, Yu Zhao <yuzhao@...gle.com>, 
	Greg Thelen <gthelen@...gle.com>, Chun-Tse Shao <ctshao@...gle.com>, 
	Suren Baghdasaryan <surenb@...gle.com>, 
	Brain Geffon <bgeffon@...gle.com>, Minchan Kim <minchan@...nel.org>, Michal Hocko <mhocko@...e.com>, 
	Mel Gorman <mgorman@...hsingularity.net>, Huang Ying <ying.huang@...el.com>, 
	Nhat Pham <nphamcs@...il.com>, Johannes Weiner <hannes@...xchg.org>, Kairui Song <kasong@...cent.com>, 
	Zhongkun He <hezhongkun.hzk@...edance.com>, Kemeng Shi <shikemeng@...weicloud.com>, 
	Barry Song <v-songbaohua@...o.com>, "Matthew Wilcox (Oracle)" <willy@...radead.org>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH 0/2] RFC: zswap tree use xarray instead of RB tree

On Fri, Jan 19, 2024 at 3:12 AM Chengming Zhou
<zhouchengming@...edance.com> wrote:
>
> On 2024/1/19 18:26, Chris Li wrote:
> > On Thu, Jan 18, 2024 at 10:19 PM Chengming Zhou
> > <zhouchengming@...edance.com> wrote:
> >>
> >> On 2024/1/19 12:59, Chris Li wrote:
> >>> On Wed, Jan 17, 2024 at 11:35 PM Chengming Zhou
> >>> <zhouchengming@...edance.com> wrote:
> >>>
> >>>>>>>                     mm-stable           zswap-split-tree    zswap-xarray
> >>>>>>> real                1m10.442s           1m4.157s            1m9.962s
> >>>>>>> user                17m48.232s          17m41.477s          17m45887s
> >>>>>>> sys                 8m13.517s           5m2.226s            7m59.305s
> >>>>>>>
> >>>>>>> Looks like the contention of concurrency is still there, I haven't
> >>>>>>> look into the code yet, will review it later.
> >>>>>
> >>>>> Thanks for the quick test. Interesting to see the sys usage drop for
> >>>>> the xarray case even with the spin lock.
> >>>>> Not sure if the 13 second saving is statistically significant or not.
> >>>>>
> >>>>> We might need to have both xarray and split trees for the zswap. It is
> >>>>> likely removing the spin lock wouldn't be able to make up the 35%
> >>>>> difference. That is just my guess. There is only one way to find out.
> >>>>
> >>>> Yes, I totally agree with this! IMHO, concurrent zswap_store paths still
> >>>> have to contend for the xarray spinlock even though we would have converted
> >>>> the rb-tree to the xarray structure at last. So I think we should have both.
> >>>>
> >>>>>
> >>>>> BTW, do you have a script I can run to replicate your results?
> >>>
> >>> Hi Chengming,
> >>>
> >>> Thanks for your script.
> >>>
> >>>>
> >>>> ```
> >>>> #!/bin/bash
> >>>>
> >>>> testname="build-kernel-tmpfs"
> >>>> cgroup="/sys/fs/cgroup/$testname"
> >>>>
> >>>> tmpdir="/tmp/vm-scalability-tmp"
> >>>> workdir="$tmpdir/$testname"
> >>>>
> >>>> memory_max="$((2 * 1024 * 1024 * 1024))"
> >>>>
> >>>> linux_src="/root/zcm/linux-6.6.tar.xz"
> >>>> NR_TASK=32
> >>>>
> >>>> swapon ~/zcm/swapfile
> >>>
> >>> How big is your swapfile here?
> >>
> >> The swapfile is big enough here, I use a 50GB swapfile.
> >
> > Thanks,
> >
> >>
> >>>
> >>> It seems you have only one swapfile there. That can explain the contention.
> >>> Have you tried multiple swapfiles for the same test?
> >>> That should reduce the contention without using your patch.
> >> Do you mean to have many 64MB swapfiles to swapon at the same time?
> >
> > 64MB is too small. There are limits to MAX_SWAPFILES. It is less than
> > (32 - n) swap files.
> > If you want to use 50G swap space, you can have MAX_SWAPFILES, each
> > swapfile 50GB / MAX_SWAPFILES.
>
> Right.
>
> >
> >> Maybe it's feasible to test,
> >
> > Of course it is testable, I am curious to see the test results.
> >
> >> I'm not sure how swapout will choose.
> >
> > It will rotate through the same priority swap files first.
> > swapfile.c: get_swap_pages().
> >
> >> But in our usecase, we normally have only one swapfile.
> >
> > Is there a good reason why you can't use more than one swapfile?
>
> I think no, but it seems an unneeded change/burden to our admin.
> So I just tested and optimized for the normal case.

I understand. Just saying it is not really a kernel limitation per say.
I blame the user space :-)

>
> > One swapfile will not take the full advantage of the existing code.
> > Even if you split the zswap trees within a swapfile. With only one
> > swapfile, you will still be having lock contention on "(struct
> > swap_info_struct).lock".
> > It is one lock per swapfile.
> > Using more than one swap file should get you better results.
>
> IIUC, we already have the per-cpu swap entry cache to not contend for
> this lock? And I don't see much hot of this lock in the testing.

Yes. The swap entry cache helps. The cache batching also causes other
problems, e.g. the long tail in swap faults handling.
Shameless plug, I have a patch posted earlier to address the swap
fault long tail latencies.

https://lore.kernel.org/linux-mm/20231221-async-free-v1-1-94b277992cb0@kernel.org/T/

Chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ