[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2025111053-saddlebag-maybe-0edc@gregkh>
Date: Mon, 10 Nov 2025 10:00:59 +0900
From: Greg KH <gregkh@...uxfoundation.org>
To: kasong@...cent.com
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Kemeng Shi <shikemeng@...weicloud.com>,
Nhat Pham <nphamcs@...il.com>, Baoquan He <bhe@...hat.com>,
Barry Song <baohua@...nel.org>, Chris Li <chrisl@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Yosry Ahmed <yosry.ahmed@...ux.dev>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Youngjun Park <youngjun.park@....com>,
Kairui Song <ryncsn@...il.com>, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] Revert "mm, swap: avoid redundant swap device pinning"
On Mon, Nov 10, 2025 at 02:06:03AM +0800, Kairui Song via B4 Relay wrote:
> From: Kairui Song <kasong@...cent.com>
>
> This reverts commit 78524b05f1a3e16a5d00cc9c6259c41a9d6003ce.
>
> While reviewing recent leaf entry changes, I noticed that commit
> 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning") isn't
> correct. It's true that most all callers of __read_swap_cache_async are
> already holding a swap entry reference, so the repeated swap device
> pinning isn't needed on the same swap device, but it is possible that
> VMA readahead (swap_vma_readahead()) may encounter swap entries from a
> different swap device when there are multiple swap devices, and call
> __read_swap_cache_async without holding a reference to that swap device.
>
> So it is possible to cause a UAF if swapoff of device A raced with
> swapin on device B, and VMA readahead tries to read swap entries from
> device A. It's not easy to trigger but in theory possible to cause real
> issues. And besides, that commit made swap more vulnerable to issues
> like corrupted page tables.
>
> Just revert it. __read_swap_cache_async isn't that sensitive to
> performance after all, as it's mostly used for SSD/HDD swap devices with
> readahead. SYNCHRONOUS_IO devices may fallback onto it for swap count >
> 1 entries, but very soon we will have a new helper and routine for
> such devices, so they will never touch this helper or have redundant
> swap device reference overhead.
>
> Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning")
> Signed-off-by: Kairui Song <kasong@...cent.com>
> ---
> mm/swap_state.c | 14 ++++++--------
> mm/zswap.c | 8 +-------
> 2 files changed, 7 insertions(+), 15 deletions(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 3f85a1c4cfd9..0c25675de977 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -406,13 +406,17 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated,
> bool skip_if_exists)
> {
> - struct swap_info_struct *si = __swap_entry_to_info(entry);
> + struct swap_info_struct *si;
> struct folio *folio;
> struct folio *new_folio = NULL;
> struct folio *result = NULL;
> void *shadow = NULL;
>
> *new_page_allocated = false;
> + si = get_swap_device(entry);
> + if (!si)
> + return NULL;
> +
> for (;;) {
> int err;
>
> @@ -499,6 +503,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> put_swap_folio(new_folio, entry);
> folio_unlock(new_folio);
> put_and_return:
> + put_swap_device(si);
> if (!(*new_page_allocated) && new_folio)
> folio_put(new_folio);
> return result;
> @@ -518,16 +523,11 @@ struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> struct vm_area_struct *vma, unsigned long addr,
> struct swap_iocb **plug)
> {
> - struct swap_info_struct *si;
> bool page_allocated;
> struct mempolicy *mpol;
> pgoff_t ilx;
> struct folio *folio;
>
> - si = get_swap_device(entry);
> - if (!si)
> - return NULL;
> -
> mpol = get_vma_policy(vma, addr, 0, &ilx);
> folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
> &page_allocated, false);
> @@ -535,8 +535,6 @@ struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>
> if (page_allocated)
> swap_read_folio(folio, plug);
> -
> - put_swap_device(si);
> return folio;
> }
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 5d0f8b13a958..aefe71fd160c 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1005,18 +1005,12 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
> struct folio *folio;
> struct mempolicy *mpol;
> bool folio_was_allocated;
> - struct swap_info_struct *si;
> int ret = 0;
>
> /* try to allocate swap cache folio */
> - si = get_swap_device(swpentry);
> - if (!si)
> - return -EEXIST;
> -
> mpol = get_task_policy(current);
> folio = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
> - NO_INTERLEAVE_INDEX, &folio_was_allocated, true);
> - put_swap_device(si);
> + NO_INTERLEAVE_INDEX, &folio_was_allocated, true);
> if (!folio)
> return -ENOMEM;
>
>
> ---
> base-commit: 02dafa01ec9a00c3758c1c6478d82fe601f5f1ba
> change-id: 20251109-revert-78524b05f1a3-04a1295bef8a
>
> Best regards,
> --
> Kairui Song <kasong@...cent.com>
>
>
>
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
</formletter>
Powered by blists - more mailing lists