linux-kernel - Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240206185145.GA97483@cmpxchg.org>
Date: Tue, 6 Feb 2024 19:51:45 +0100
From: Johannes Weiner <hannes@...xchg.org>
To: Nhat Pham <nphamcs@...il.com>
Cc: akpm@...ux-foundation.org, chengming.zhou@...ux.dev,
	yosryahmed@...gle.com, linux-mm@...ck.org, kernel-team@...a.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range
 with the folio locked

On Tue, Feb 06, 2024 at 10:08:55AM -0800, Nhat Pham wrote:
> When a folio is swapped in, the protection size of the corresponding
> zswap LRU is incremented, so that the zswap shrinker is more
> conservative with its reclaiming action. This field is embedded within
> the struct lruvec, so updating it requires looking up the folio's memcg
> and lruvec. However, currently this lookup can happen after the folio is
> unlocked, for instance if a new folio is allocated, and
> swap_read_folio() unlocks the folio before returning. In this scenario,
> there is no stability guarantee for the binding between a folio and its
> memcg and lruvec:
> 
> * A folio's memcg and lruvec can be freed between the lookup and the
>   update, leading to a UAF.
> * Folio migration can clear the now-unlocked folio's memcg_data, which
>   directs the zswap LRU protection size update towards the root memcg
>   instead of the original memcg. This was recently picked up by the
>   syzbot thanks to a warning in the inlined folio_lruvec() call.
> 
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated, to prevent this.
> 
> Reported-by: syzbot+17a611d10af7d18a7092@...kaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <nphamcs@...il.com>

Looks great, thanks for updating it!

One more thing I just realized:

> ---
>  mm/swap_state.c | 10 ++++++----
>  mm/zswap.c      |  1 +
>  2 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e671266ad772..7255c01a1e4e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
>  	/* The page was likely read above, so no need for plugging here */
>  	folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
>  					&page_allocated, false);
> -	if (unlikely(page_allocated))
> +	if (unlikely(page_allocated)) {
> +		zswap_folio_swapin(folio);
>  		swap_read_folio(folio, false, NULL);
> -	zswap_folio_swapin(folio);
> +	}
>  	return folio;
>  }
>  
> @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
>  	/* The folio was likely read above, so no need for plugging here */
>  	folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
>  					&page_allocated, false);
> -	if (unlikely(page_allocated))
> +	if (unlikely(page_allocated)) {
> +		zswap_folio_swapin(folio);
>  		swap_read_folio(folio, false, NULL);
> -	zswap_folio_swapin(folio);
> +	}
>  	return folio;
>  }
>  
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4aea03285532..8c548f73d52e 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -827,6 +827,7 @@ void zswap_folio_swapin(struct folio *folio)
>  	struct lruvec *lruvec;
>  
>  	if (folio) {
> +		VM_WARN_ON_ONCE(!folio_test_locked(folio));
>  		lruvec = folio_lruvec(folio);
>  		atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
>  	}

The NULL check is now also no longer necessary.

It used to be called unconditionally, even if
__read_swap_cache_async() failed and returned NULL.

However, page_allocated == true implies success. That newly allocated
and locked folio is always returned.