linux-kernel - Re: [PATCH mm-new 2/2] mm/shmem: writeout free swap if swap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a6f83e63-c5af-cd27-5379-0e8fcb347e18@google.com>
Date: Fri, 18 Jul 2025 17:51:16 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
cc: Hugh Dickins <hughd@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, 
    Baoquan He <bhe@...hat.com>, Barry Song <21cnbao@...il.com>, 
    Chris Li <chrisl@...nel.org>, David Rientjes <rientjes@...gle.com>, 
    Kairui Song <ryncsn@...il.com>, Kemeng Shi <shikemeng@...weicloud.com>, 
    Shakeel Butt <shakeel.butt@...ux.dev>, linux-kernel@...r.kernel.org, 
    linux-mm@...ck.org
Subject: Re: [PATCH mm-new 2/2] mm/shmem: writeout free swap if swap_writeout()
 reactivates

On Thu, 17 Jul 2025, Baolin Wang wrote:

> Hi Hugh,
> 
> On 2025/7/16 16:08, Hugh Dickins wrote:
> > If swap_writeout() returns AOP_WRITEPAGE_ACTIVATE (for example, because
> > zswap cannot compress and memcg disables writeback), there is no virtue
> > in keeping that folio in swap cache and holding the swap allocation:
> > shmem_writeout() switch it back to shmem page cache before returning.
> > 
> > Folio lock is held, and folio->memcg_data remains set throughout, so
> > there is no need to get into any memcg or memsw charge complications:
> > swap_free_nr() and delete_from_swap_cache() do as much as is needed (but
> > beware the race with shmem_free_swap() when inode truncated or evicted).
> > 
> > Doing the same for an anonymous folio is harder, since it will usually
> > have been unmapped, with references to the swap left in the page tables.
> > Adding a function to remap the folio would be fun, but not worthwhile
> > unless it has other uses, or an urgent bug with anon is demonstrated.
> > 
> > Signed-off-by: Hugh Dickins <hughd@...gle.com>
> > ---
> >   mm/shmem.c | 33 ++++++++++++++++++++++++++++++++-
> >   1 file changed, 32 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 33675361031b..5a7ce4c8bad6 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1655,6 +1655,7 @@ int shmem_writeout(struct folio *folio, struct
> > swap_iocb **plug,
> >   
> >    if (!folio_alloc_swap(folio, __GFP_HIGH | __GFP_NOMEMALLOC |
> >    __GFP_NOWARN)) {
> >   		bool first_swapped = shmem_recalc_inode(inode, 0, nr_pages);
> > +		int error;
> >   
> >     /*
> >   		 * Add inode to shmem_unuse()'s list of swapped-out inodes,
> > @@ -1675,7 +1676,37 @@ int shmem_writeout(struct folio *folio, struct
> > swap_iocb **plug,
> >     shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap));
> >   
> >   		BUG_ON(folio_mapped(folio));
> > -		return swap_writeout(folio, plug);
> > +		error = swap_writeout(folio, plug);
> > +		if (error != AOP_WRITEPAGE_ACTIVATE) {
> > +			/* folio has been unlocked */
> > +			return error;
> > +		}
> > +
> > +		/*
> > +		 * The intention here is to avoid holding on to the swap when
> > +		 * zswap was unable to compress and unable to writeback; but
> > +		 * it will be appropriate if other reactivate cases are added.
> > +		 */
> > +		error = shmem_add_to_page_cache(folio, mapping, index,
> > +				swp_to_radix_entry(folio->swap),
> > +				__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
> > +		/* Swap entry might be erased by racing shmem_free_swap() */
> > +		if (!error) {
> > +			spin_lock(&info->lock);
> > +			info->swapped -= nr_pages;
> > +			spin_unlock(&info->lock);
> 
> Using the helper 'shmem_recalc_inode(inode, 0, -nr_pages)' seems more
> readable?

Yes, that's better, thanks: I don't know if I'd say "more readable",
but it is much more in the spirit of shmem_recalc_inode(), bringing
the counts into balance sooner rather than later.

I'll follow up with a "fix" patch to Andrew.

> 
> > +			swap_free_nr(folio->swap, nr_pages);
> > +		}
> > +
> > +		/*
> > +		 * The delete_from_swap_cache() below could be left for
> > +		 * shrink_folio_list()'s folio_free_swap() to dispose of;
> > +		 * but I'm a little nervous about letting this folio out of
> > +		 * shmem_writeout() in a hybrid half-tmpfs-half-swap state
> > +		 * e.g. folio_mapping(folio) might give an unexpected answer.
> > +		 */
> > +		delete_from_swap_cache(folio);
> 
> IIUC, Should the delete_from_swap_cache() also be moved into the 'if (!error)'
> branch? Since if shmem_free_swap() has freed the swap entry, it would also
> reclaim the swap cache, no?

No, but it was a good point to raise, and led into more research than
I had anticipated.

No: because shmem_free_swap->free_swap_and_cache_nr->__try_to_reclaim_swap
has to return after doing nothing if its folio_trylock fails: it cannot do
the delete_from_swap_cache() part of the job, which we do here - on this
AOP_WRITEPAGE_ACTIVATE path, we hold the folio_lock throughout.

But it led into more research, because I wanted to point you to the
equivalent coding in shmem_swapin_folio(): but, to my initial alarm,
the equivalent is not there; but used to be.

See 5.8 commit 14235ab36019 ("mm: shmem: remove rare optimization when
swapin races with hole punching").  There (in the deleted lines) you can
see the helpful comment on this case, with its delete_from_swap_cache()
when shmem_add_to_page_cache() fails.  But for memcg-charging reasons,
5.8 found it simpler to drop that, and just let shrink_page_list()
clear up the debris later.

Here in shmem_writeout(), holding folio_lock throughout, we have no
memcg complications, and can go ahead with delete_from_swap_cache(),
both when successfully added back to page cache, and when that fails.

Thanks for checking!
Hugh