[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbVXQWgcPD-bgK7iDba4NFLo2tT89ZbLOa03maJU4er4ag@mail.gmail.com>
Date: Tue, 10 Feb 2026 15:17:47 -0800
From: Chris Li <chrisl@...nel.org>
To: Nhat Pham <nphamcs@...il.com>
Cc: akpm@...ux-foundation.org, hannes@...xchg.org, hughd@...gle.com,
yosry.ahmed@...ux.dev, mhocko@...nel.org, roman.gushchin@...ux.dev,
shakeel.butt@...ux.dev, muchun.song@...ux.dev, len.brown@...el.com,
chengming.zhou@...ux.dev, kasong@...cent.com, huang.ying.caritas@...il.com,
ryan.roberts@....com, shikemeng@...weicloud.com, viro@...iv.linux.org.uk,
baohua@...nel.org, bhe@...hat.com, osalvador@...e.de,
christophe.leroy@...roup.eu, pavel@...nel.org, linux-mm@...ck.org,
kernel-team@...a.com, linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
linux-pm@...r.kernel.org, peterx@...hat.com, riel@...riel.com,
joshua.hahnjy@...il.com, npache@...hat.com, gourry@...rry.net,
axelrasmussen@...gle.com, yuanchu@...gle.com, weixugc@...gle.com,
rafael@...nel.org, jannh@...gle.com, pfalcato@...e.de,
zhengqi.arch@...edance.com
Subject: Re: [PATCH v3 00/20] Virtual Swap Space
On Tue, Feb 10, 2026 at 10:00 AM Nhat Pham <nphamcs@...il.com> wrote:
>
> On Mon, Feb 9, 2026 at 4:20 AM Chris Li <chrisl@...nel.org> wrote:
> >
> > On Sun, Feb 8, 2026 at 4:15 PM Nhat Pham <nphamcs@...il.com> wrote:
> > >
> > > My sincerest apologies - it seems like the cover letter (and just the
> > > cover letter) fails to be sent out, for some reason. I'm trying to figure
> > > out what happened - it works when I send the entire patch series to
> > > myself...
> > >
> > > Anyway, resending this (in-reply-to patch 1 of the series):
> >
> > For the record I did receive your original V3 cover letter from the
> > linux-mm mailing list.
>
> I have no idea what happened to be honest. It did not show up on lore
> for a couple of hours, and my coworkers did not receive the cover
> letter email initially. I did not receive any error message or logs
> either - git send-email returns Success to me, and when I checked on
> the web gmail client (since I used a gmail email account), the whole
> series is there.
>
> I tried re-sending a couple times, to no avail. Then, in a couple of
> hours, all of these attempts showed up.
>
> Anyway, this is my bad - I'll be more patient next time. If it does
> not show up for a couple of hours then I'll do some more digging.
No problem. Just want to provide more data points if that helps you
debug your email issue.
> > > Changelog:
> > > * RFC v2 -> v3:
> > > * Implement a cluster-based allocation algorithm for virtual swap
> > > slots, inspired by Kairui Song and Chris Li's implementation, as
> > > well as Johannes Weiner's suggestions. This eliminates the lock
> > > contention issues on the virtual swap layer.
> > > * Re-use swap table for the reverse mapping.
> > > * Remove CONFIG_VIRTUAL_SWAP.
> > > * Reducing the size of the swap descriptor from 48 bytes to 24
> >
> > Is the per swap slot entry overhead 24 bytes in your implementation?
> > The current swap overhead is 3 static +8 dynamic, your 24 dynamic is a
> > big jump. You can argue that 8->24 is not a big jump . But it is an
> > unnecessary price compared to the alternatives, which is 8 dynamic +
> > 4(optional redirect).
>
> It depends in cases - you can check the memory overhead discussion below :)
I think the "24B dynamic" sums up the VS memory overhead pretty well
without going into the detail tables. You can drive from case
discussion from that.
> > BTW, I have the following compile error with this series (fedora 43).
> > Same config compile fine on v6.19.
> >
> > In file included from ./include/linux/local_lock.h:5,
> > from ./include/linux/mmzone.h:24,
> > from ./include/linux/gfp.h:7,
> > from ./include/linux/mm.h:7,
> > from mm/vswap.c:7:
> > mm/vswap.c: In function ‘vswap_cpu_dead’:
> > ./include/linux/percpu-defs.h:221:45: error: initialization from
> > pointer to non-enclosed address space
> > 221 | const void __percpu *__vpp_verify = (typeof((ptr) +
> > 0))NULL; \
> > | ^
> > ./include/linux/local_lock_internal.h:105:40: note: in definition of
> > macro ‘__local_lock_acquire’
> > 105 | __l = (local_lock_t *)(lock);
> > \
> > | ^~~~
> > ./include/linux/local_lock.h:17:41: note: in expansion of macro
> > ‘__local_lock’
> > 17 | #define local_lock(lock) __local_lock(this_cpu_ptr(lock))
> > | ^~~~~~~~~~~~
> > ./include/linux/percpu-defs.h:245:9: note: in expansion of macro
> > ‘__verify_pcpu_ptr’
> > 245 | __verify_pcpu_ptr(ptr);
> > \
> > | ^~~~~~~~~~~~~~~~~
> > ./include/linux/percpu-defs.h:256:27: note: in expansion of macro ‘raw_cpu_ptr’
> > 256 | #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
> > | ^~~~~~~~~~~
> > ./include/linux/local_lock.h:17:54: note: in expansion of macro
> > ‘this_cpu_ptr’
> > 17 | #define local_lock(lock)
> > __local_lock(this_cpu_ptr(lock))
> > |
> > ^~~~~~~~~~~~
> > mm/vswap.c:1518:9: note: in expansion of macro ‘local_lock’
> > 1518 | local_lock(&percpu_cluster->lock);
> > | ^~~~~~~~~~
>
> Ah that's strange. It compiled on all of my setups (I tested with a couple
> different ones), but I must have missed some cases. Would you mind
> sharing your configs so that I can reproduce this compilation error?
See attached config.gz. It is also possible the newer gcc version
contributes to that error. Anyway, that is preventing me from stress
testing your series on my setup.
>
> >
> > > 1. Kernel building: 52 workers (one per processor), memory.max = 3G.
> > >
> > > Using zswap as the backend:
> > >
> > > Baseline:
> > > real: mean: 185.2s, stdev: 0.93s
> > > sys: mean: 683.7s, stdev: 33.77s
> > >
> > > Vswap:
> > > real: mean: 184.88s, stdev: 0.57s
> > > sys: mean: 675.14s, stdev: 32.8s
> >
> > Can you show your user space time as well to complete the picture?
>
> Will do next time! I used to include user time as well, but I noticed
> that folks (for e.g see [1]) only include systime, not even real time,
> so I figure nobody cares about user time :)
>
> (I still include real time because some of my past work improves sys
> time but regresses real time, so I figure that's relevant).
>
> [1]: https://lore.kernel.org/linux-mm/20260128-swap-table-p3-v2-0-fe0b67ef0215@tencent.com/
>
> But yeah no big deal. I'll dig through my logs to see if I still have
> the numbers, but if not I'll include it in next version.
Mostly I want to get an impression how hard you push our swap test cases.
>
> >
> > How many runs do you have for stdev 32.8s?
>
> 5 runs! I average out the result of 5 runs.
The stddev is 33 seconds. Measure 5 times then average result is not
enough sample to get your to 1.5% resolution (8 seconds), which fall
into the range of noise.
> > I strongly suspect there is some performance difference that hasn't
> > been covered by your test yet. Need more conformation by others on the
> > performance measurement. The swap testing is tricky. You want to push
> > to stress barely within the OOM limit. Need more data.
>
> Very fair point :) I will say though - the kernel build test, with
> memory.max limit sets, does generate a sizable amount of swapping, and
> does OOM if you don't set up swap. Take my words for now, but I will
> try to include average per-run (z)swap activity stats (zswpout zswpin
> etc.) in future versions if you're interested :)
Including the user space time will help determine the level of swap
pressure as well. I don't need the absolutely zswapout count just yet.
> I've been trying to running more stress tests to trigger crashes and
> performance regression. One of the big reasons why I haven't sent
> anything til now is to fix obvious performance issues (the
> aforementioned lock contention) and bugs. It's a complicated piece of
> work.
>
> As always, would love to receive code/design feedback from you (and
> Kairui, and other swap reviewers), and I would appreciate very much if
> other swap folks can play with the patch series on their setup as well
> for performance testing, or let me know if there is any particular
> case that they're interested in :)
I understand Kairui has some measurements that show regressions.
If you can fix the compiling error I can do some stress testing myself
to provide more data points.
Thanks
Chris
Download attachment "config.gz" of type "application/gzip" (41914 bytes)
Powered by blists - more mailing lists