[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMgjq7D9Z=u2J18DExmzeU8fRbvqNwyC3tem2aykAsm79=QGEA@mail.gmail.com>
Date: Wed, 9 Apr 2025 00:59:24 +0800
From: Kairui Song <ryncsn@...il.com>
To: Nhat Pham <nphamcs@...il.com>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org, hannes@...xchg.org,
hughd@...gle.com, yosry.ahmed@...ux.dev, mhocko@...nel.org,
roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev,
len.brown@...el.com, chengming.zhou@...ux.dev, chrisl@...nel.org,
huang.ying.caritas@...il.com, ryan.roberts@....com, viro@...iv.linux.org.uk,
baohua@...nel.org, osalvador@...e.de, lorenzo.stoakes@...cle.com,
christophe.leroy@...roup.eu, pavel@...nel.org, kernel-team@...a.com,
linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
linux-pm@...r.kernel.org
Subject: Re: [RFC PATCH 00/14] Virtual Swap Space
On Wed, Apr 9, 2025 at 12:48 AM Nhat Pham <nphamcs@...il.com> wrote:
>
> On Tue, Apr 8, 2025 at 9:23 AM Kairui Song <ryncsn@...il.com> wrote:
> >
> >
> > Thanks for sharing the code, my initial idea after the discussion at
> > LSFMM is that there is a simple way to combine this with the "swap
> > table" [1] design of mine to solve the performance issue of this
> > series: just store the pointer of this struct in the swap table. It's
> > a bruteforce and glue like solution but the contention issue will be
> > gone.
>
> Was waiting for your submission, but I figured I should send what I
> had out first for immediate feedback :)
>
> Johannes actually proposed something similar to your physical swap
> allocator for the virtual swap slots allocation logic, to solve our
> lock contention problem. My apologies - I should have name-dropped you
> in the RFC cover as well (the cover was a bit outdated, and I haven't
> updated the newest developments that came from the LSFMMBPF
> conversation in the cover letter).
>
> >
> > Of course it's not a good approach, ideally the data structure can be
> > simplified to an entry type in the swap table. The swap table series
> > handles locking and synchronizations using either cluster lock
> > (reusing swap allocator and existing swap logics) or folio lock (kind
> > of like page cache). So many parts can be much simplified, I think it
> > will be at most ~32 bytes per page with a virtual device (including
> > the intermediate pointers).Will require quite some work though.
> >
> > The good side with that approach is we will have a much lower memory
> > overhead and even better performance. And the virtual space part will
> > be optional, for non virtual setup the memory consumption will be only
> > 8 bytes per page and also dynamically allocated, as discussed at
> > LSFMM.
>
> I think one problem with your design, which I alluded to at the
> conference, is that it doesn't quite work for our requirements -
> namely the separation of zswap from its underlying backend.
>
> All the metadata HAVE to live at the virtual layer. For once, we are
> duplicating the logic if we push this to the backend.
>
> But more than that, there are lifetime operations that HAVE to be
> backend-agnostic. For instance, on the swap out path, when we unmap
> the page from the page table, we do swap_duplicate() (i.,e increasing
> the swap count/reference count of the swap entries). At that point, we
> have not (and cannot) make a decision regarding the backend storage
> yet, and thus does not have any backend-specific places to hold this
> piece of information. If we couple all the backends then yeah sure we
> can store it at the physical swapfile level, but that defeats the
> purpose of swap virtualization :)
Ah, now I get why you have to store the data in the virtual layer.
I was thinking that doing it in the physical layer will make it easier
to reuse what swap already has. But if you need to be completely
backend-agnostic, then just keep it in the virtual layer. Seems not a
foundunmentail issue, it could be worked out in some way I think. eg.
using another table type. I'll check if that would work after I've
done the initial parts.
>
> >
> > So sorry that I still have a few parts undone, looking forward to
> > posting in about one week, eg. After this weekend it goes well. I'll
> > also try to check your series first to see how these can be
> > collaborated better.
>
> Of course, I'm not against collaboration :) As I mentioned earlier, we
> need more work on the allocation part, which your physical swapfile
> allocator should either work, or serve as the inspiration for.
>
> Cheers,
> Nhat
Powered by blists - more mailing lists