[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=Npoa5pxeqOHqNTreqVEKpnFQ2rg=yN6eOgFQngBp6X+w@mail.gmail.com>
Date: Tue, 17 Oct 2023 12:03:10 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: akpm@...ux-foundation.org, hannes@...xchg.org,
cerasuolodomenico@...il.com, sjenning@...hat.com,
ddstreet@...e.org, vitaly.wool@...sulko.com, hughd@...gle.com,
corbet@....net, konrad.wilk@...cle.com, senozhatsky@...omium.org,
rppt@...nel.org, linux-mm@...ck.org, kernel-team@...a.com,
linux-kernel@...r.kernel.org, david@...t.cz
Subject: Re: [PATCH 0/2] minimize swapping on zswap store failure
On Mon, Oct 16, 2023 at 5:58 PM Yosry Ahmed <yosryahmed@...gle.com> wrote:
>
> On Mon, Oct 16, 2023 at 5:35 PM Nhat Pham <nphamcs@...il.com> wrote:
> >
> > Currently, when a zswap store attempt fails, the page is immediately
> > swapped out. This could happen for a variety of reasons. For instance,
> > the compression algorithm could fail (such as when the data is not
> > compressible), or the backend allocator might not be able to find a
> > suitable slot for the compressed page. If these pages are needed
> > later on, users will incur IOs from swapins.
> >
> > This issue prevents the adoption of zswap for potential users who
> > cannot tolerate the latency associated with swapping. In many cases,
> > these IOs are avoidable if we just keep in memory the pages that zswap
> > fail to store.
> >
> > This patch series add two new features for zswap that will alleviate
> > the risk of swapping:
> >
> > a) When a store attempt fail, keep the page untouched in memory
> > instead of swapping it out.
>
> What about writeback when the zswap limit is hit? I understand the
> problem, but I am wondering if this is the correct way of fixing it.
> We really need to make zswap work without a backing swapfile, which I
> think is the correct way to fix all these problems. I was working on
> that, but unfortunately I had to pivot to something else before I had
> something that was working.
>
> At Google, we have "ghost" swapfiles that we use just to use zswap
> without a swapfile. They are sparse files, and we have internal kernel
> patches to flag them and never try to actually write to them.
>
> I am not sure how many bandaids we can afford before doing the right
> thing. I understand it's a much larger surgery, perhaps there is a way
> to get a short-term fix that is also a step towards the final state we
> want to reach instead?
Regarding the writeback - I'll make sure to also short-circuit writeback
when the bypass_swap option is enabled in v2 :) I'll probably send out
v2 after
I absolutely agree that we must decouple zswap and swap (and would
be happy to help out in any capacity I could - we have heard similar
concerns/complaints about swap wastage from internal parties as well).
However, as Johannes has pointed out, this feature still has its place,
given our already existing swapfile deployments. I do agree that a
global knob is insufficient tho. I'll add a per-cgroup knob in v2 so that
we can enable/disable this feature on a per-workload basis.
>
> >
> > b) If the store attempt fails at the compression step, allow the page
> > to be stored in its uncompressed form in the zswap pool. This maintains
> > the LRU ordering of pages, which will be helpful for accurate
> > memory reclaim (zswap writeback in particular).
>
> This is dangerous. Johannes and I discussed this before. This means
> that reclaim can end up allocating more memory instead of freeing.
> Allocations made in the reclaim path are made under the assumption
> that we will eventually free memory. In this case, we won't. In the
> worst case scenario, reclaim can leave the system/memcg in a worse
> state than before it started.
>
> Perhaps there is a way we can do this without allocating a zswap entry?
>
> I thought before about having a special list_head that allows us to
> use the lower bits of the pointers as markers, similar to the xarray.
> The markers can be used to place different objects on the same list.
> We can have a list that is a mixture of struct page and struct
> zswap_entry. I never pursued this idea, and I am sure someone will
> scream at me for suggesting it. Maybe there is a less convoluted way
> to keep the LRU ordering intact without allocating memory on the
> reclaim path.
Hmm yeah you're right about these concerns. That seems like a lot more
involved than what I envisioned initially.
Let's put this aside for now. I'll just send the first patch in v2, and we can
work on + discuss more about uncompressed pages storing later on.
>
> >
> > These features could be enabled independently via two new zswap module
> > parameters.
> >
> > Nhat Pham (2):
> > swap: allows swap bypassing on zswap store failure
> > zswap: store uncompressed pages when compression algorithm fails
> >
> > Documentation/admin-guide/mm/zswap.rst | 16 +++++++
> > include/linux/zswap.h | 9 ++++
> > mm/page_io.c | 6 +++
> > mm/shmem.c | 8 +++-
> > mm/zswap.c | 64 +++++++++++++++++++++++---
> > 5 files changed, 95 insertions(+), 8 deletions(-)
> >
> > --
> > 2.34.1
Powered by blists - more mailing lists