linux-kernel - Re: [PATCH 1/4] mm/shmem, swap: improve cached mTHP handling and fix potential hung

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMgjq7BLKv8d5+TNbEqSiPSteJvjTBsbphwDsxdR4Mk0gj7C7g@mail.gmail.com>
Date: Wed, 18 Jun 2025 10:11:21 +0800
From: Kairui Song <ryncsn@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, Hugh Dickins <hughd@...gle.com>, 
	Baolin Wang <baolin.wang@...ux.alibaba.com>, Matthew Wilcox <willy@...radead.org>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Chris Li <chrisl@...nel.org>, 
	Nhat Pham <nphamcs@...il.com>, Baoquan He <bhe@...hat.com>, Barry Song <baohua@...nel.org>, 
	linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH 1/4] mm/shmem, swap: improve cached mTHP handling and fix
 potential hung

On Wed, Jun 18, 2025 at 6:58 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Wed, 18 Jun 2025 02:35:00 +0800 Kairui Song <ryncsn@...il.com> wrote:
>
> > From: Kairui Song <kasong@...cent.com>
> >
> > The current swap-in code assumes that, when a swap entry in shmem
> > mapping is order 0, its cached folios (if present) must be order 0
> > too, which turns out not always correct.
> >
> > The problem is shmem_split_large_entry is called before verifying the
> > folio will eventually be swapped in, one possible race is:
> >
> >     CPU1                          CPU2
> > shmem_swapin_folio
> > /* swap in of order > 0 swap entry S1 */
> >   folio = swap_cache_get_folio
> >   /* folio = NULL */
> >   order = xa_get_order
> >   /* order > 0 */
> >   folio = shmem_swap_alloc_folio
> >   /* mTHP alloc failure, folio = NULL */
> >   <... Interrupted ...>
> >                                  shmem_swapin_folio
> >                                  /* S1 is swapped in */
> >                                  shmem_writeout
> >                                  /* S1 is swapped out, folio cached */
> >   shmem_split_large_entry(..., S1)
> >   /* S1 is split, but the folio covering it has order > 0 now */
> >
> > Now any following swapin of S1 will hang: `xa_get_order` returns 0,
> > and folio lookup will return a folio with order > 0. The
> > `xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will
> > always return false causing swap-in to return -EEXIST.
> >
> > And this looks fragile. So fix this up by allowing seeing a larger folio
> > in swap cache, and check the whole shmem mapping range covered by the
> > swapin have the right swap value upon inserting the folio. And drop
> > the redundant tree walks before the insertion.
> >
> > This will actually improve the performance, as it avoided two redundant
> > Xarray tree walks in the hot path, and the only side effect is that in
> > the failure path, shmem may redundantly reallocate a few folios
> > causing temporary slight memory pressure.
> >
> > And worth noting, it may seems the order and value check before
> > inserting might help reducing the lock contention, which is not true.
> > The swap cache layer ensures raced swapin will either see a swap cache
> > folio or failed to do a swapin (we have SWAP_HAS_CACHE bit even if
> > swap cache is bypassed), so holding the folio lock and checking the
> > folio flag is already good enough for avoiding the lock contention.
> > The chance that a folio passes the swap entry value check but the
> > shmem mapping slot has changed should be very low.
> >
> > Cc: stable@...r.kernel.org
> > Fixes: 058313515d5a ("mm: shmem: fix potential data corruption during shmem swapin")
> > Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
>
> The Fixes: tells -stable maintainers (and others) which kernel versions
> need the fix.  So having two Fixes: against different kernel versions is
> very confusing!  Are we recommending that kernels which contain
> 809bc86517cc but not 058313515d5a be patched?

809bc86517cc introduced mTHP support for shmem but it's buggy, and
058313515d5a tried to fix that, which is also buggy, I thought this
could help people to backport this.

I think keeping either is OK, I'll keep 809bc86517cc then, any branch
having 809bc86517cc should already have 058313515d5a backported.