linux-kernel - Re: [PATCH] zswap: do not crash the kernel on decompression failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250226045727.GB1775487@cmpxchg.org>
Date: Tue, 25 Feb 2025 23:57:27 -0500
From: Johannes Weiner <hannes@...xchg.org>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Nhat Pham <nphamcs@...il.com>, akpm@...ux-foundation.org,
	chengming.zhou@...ux.dev, linux-mm@...ck.org, kernel-team@...a.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] zswap: do not crash the kernel on decompression failure

On Wed, Feb 26, 2025 at 03:12:35AM +0000, Yosry Ahmed wrote:
> On Tue, Feb 25, 2025 at 01:32:00PM -0800, Nhat Pham wrote:
> > Currently, we crash the kernel when a decompression failure occurs in
> > zswap (either because of memory corruption, or a bug in the compression
> > algorithm). This is overkill. We should only SIGBUS the unfortunate
> > process asking for the zswap entry on zswap load, and skip the corrupted
> > entry in zswap writeback.
> 
> Some relevant observations/questions, but not really actionable for this
> patch, perhaps some future work, or more likely some incoherent
> illogical thoughts :
> 
> (1) It seems like not making the folio uptodate will cause shmem faults
> to mark the swap entry as hwpoisoned, but I don't see similar handling
> for do_swap_page(). So it seems like even if we SIGBUS the process,
> other processes mapping the same page could follow in the same
> footsteps.

It's analogous to what __end_swap_bio_read() does for block backends,
so it's hitchhiking on the standard swap protocol for read failures.

The page sticks around if there are other users. It can get reclaimed,
but since it's not marked dirty, it won't get overwritten. Another
access will either find it in the swapcache and die on !uptodate; if
it was reclaimed, it will attempt another decompression. If all
references have been killed, zswap_invalidate() will finally drop it.

Swapoff actually poisons the page table as well (unuse_pte).

> (2) A hwpoisoned swap entry results in VM_FAULT_SIGBUS in some cases
> (e.g. shmem_fault() -> shmem_get_folio_gfp() -> shmem_swapin_folio()),
> even though we have VM_FAULT_HWPOISON. This patch falls under this
> bucket, but unfortunately we cannot tell for sure if it's a hwpoision or
> a decompression bug.

Are you sure? Actual memory failure should replace the ptes of a
mapped shmem page with TTU_HWPOISON, which turns them into special
swap entries that trigger VM_FAULT_HWPOISON in do_swap_page().

Anon swap distinguishes as long as the swapfile is there. Swapoff
installs poison markers, which are then handled the same in future
faults (VM_FAULT_HWPOISON):

/*
 * "Poisoned" here is meant in the very general sense of "future accesses are
 * invalid", instead of referring very specifically to hardware memory errors.
 * This marker is meant to represent any of various different causes of this.
 *
 * Note that, when encountered by the faulting logic, PTEs with this marker will
 * result in VM_FAULT_HWPOISON and thus regardless trigger hardware memory error
 * logic.
 */
#define  PTE_MARKER_POISONED                    BIT(1)