[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250307013559.GA423735@cmpxchg.org>
Date: Thu, 6 Mar 2025 20:35:59 -0500
From: Johannes Weiner <hannes@...xchg.org>
To: Nhat Pham <nphamcs@...il.com>
Cc: akpm@...ux-foundation.org, yosryahmed@...gle.com, yosry.ahmed@...ux.dev,
chengming.zhou@...ux.dev, linux-mm@...ck.org, kernel-team@...a.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] page_io: zswap: do not crash the kernel on
decompression failure
On Thu, Mar 06, 2025 at 12:50:10PM -0800, Nhat Pham wrote:
> Currently, we crash the kernel when a decompression failure occurs in
> zswap (either because of memory corruption, or a bug in the compression
> algorithm). This is overkill. We should only SIGBUS the unfortunate
> process asking for the zswap entry on zswap load, and skip the corrupted
> entry in zswap writeback.
>
> See [1] for a recent upstream discussion about this.
>
> The zswap writeback case is relatively straightforward to fix. For the
> zswap_load() case, we change the return behavior:
>
> * Return 0 on success.
> * Return -ENOENT (with the folio locked) if zswap does not own the
> swapped out content.
> * Return -EIO if zswap owns the swapped out content, but encounters a
> decompression failure for some reasons. The folio will be unlocked,
> but not be marked up-to-date, which will eventually cause the process
> requesting the page to SIGBUS (see the handling of not-up-to-date
> folio in do_swap_page() in mm/memory.c), without crashing the kernel.
> * Return -EINVAL if we encounter a large folio, as large folio should
> not be swapped in while zswap is being used. Similar to the -EIO case,
> we also unlock the folio but do not mark it as up-to-date to SIGBUS
> the faulting process.
>
> As a side effect, we require one extra zswap tree traversal in the load
> and writeback paths. Quick benchmarking on a kernel build test shows no
> performance difference:
>
> With the new scheme:
> real: mean: 125.1s, stdev: 0.12s
> user: mean: 3265.23s, stdev: 9.62s
> sys: mean: 2156.41s, stdev: 13.98s
>
> The old scheme:
> real: mean: 125.78s, stdev: 0.45s
> user: mean: 3287.18s, stdev: 5.95s
> sys: mean: 2177.08s, stdev: 26.52s
>
> [1]: https://lore.kernel.org/all/ZsiLElTykamcYZ6J@casper.infradead.org/
>
> Suggested-by: Matthew Wilcox <willy@...radead.org>
> Suggested-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
> Suggested-by: Johannes Weiner <hannes@...xchg.org>
> Signed-off-by: Nhat Pham <nphamcs@...il.com>
Acked-by: Johannes Weiner <hannes@...xchg.org>
Powered by blists - more mailing lists