[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbWnsrGxUv15R4kMpc10wsBE2Bc-rapGyBYjekrqFJgJbw@mail.gmail.com>
Date: Fri, 29 Aug 2025 18:53:53 -0700
From: Chris Li <chrisl@...nel.org>
To: Kairui Song <ryncsn@...il.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Matthew Wilcox <willy@...radead.org>, Hugh Dickins <hughd@...gle.com>, Barry Song <baohua@...nel.org>,
Baoquan He <bhe@...hat.com>, Nhat Pham <nphamcs@...il.com>,
Kemeng Shi <shikemeng@...weicloud.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
Ying Huang <ying.huang@...ux.alibaba.com>, Johannes Weiner <hannes@...xchg.org>,
David Hildenbrand <david@...hat.com>, Yosry Ahmed <yosryahmed@...gle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/9] mm, swap: always lock and check the swap cache folio
before use
On Wed, Aug 27, 2025 at 7:36 AM Kairui Song <ryncsn@...il.com> wrote:
>
> On Wed, Aug 27, 2025 at 4:21 PM Chris Li <chrisl@...nel.org> wrote:
> >
> > On Fri, Aug 22, 2025 at 12:21 PM Kairui Song <ryncsn@...il.com> wrote:
> >
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index e9d0d2784cd5..b4d39f2a1e0a 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -2379,8 +2379,6 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
> > > count_vm_event(PGMAJFAULT);
> > > count_memcg_event_mm(fault_mm, PGMAJFAULT);
> > > }
> > > - } else {
> > > - swap_update_readahead(folio, NULL, 0);
> >
> > Also this update readahead move to later might have a similar problem.
> > All the bail out in the move will lose the readahead status update.
> >
> > The readahead deed is already done. Missing the status update seems
> > incorrect.
>
> Thanks for the detailed review.
>
> The only change I wanted here is that swap readahead update should be
> done after checking the folio still corresponds to the swap entry
> triggering the swapin. That should have slight to none effect compared
> to before considering the extremely tiny time window. We are only
> following the convention more strictly.
>
> In theory it might even help to reduce false updates: if the folio no
> longer corresponds to the swap entry, we are hitting an unrelated
> folio, doing a readahead update will either mislead vma readahead's
> address hint, or could clean up the readahead flag of an unrelated
> folio without actually using it. If the folio does get hit in the
> future, due to the missing readahead flag, the statistic will go
> wrong.
So the missing readahead stats update behavior is the correct and
better behavior. I suggest you spit that out as a separate patch with
appropriate comments about it too. It is also easier to bisect the
commit if that kind of the subtle change which is considered safe
turns out causing a problem. Causing problem not happen very often but
it does happen before.
> > > /*
> > > - * Lookup a swap entry in the swap cache. A found folio will be returned
> > > - * unlocked and with its refcount incremented.
> > > + * swap_cache_get_folio - Lookup a swap entry in the swap cache.
> > > *
> > > - * Caller must lock the swap device or hold a reference to keep it valid.
> > > + * A found folio will be returned unlocked and with its refcount increased.
> > > + *
> > > + * Context: Caller must ensure @entry is valid and pin the swap device, also
> > Is the "pin" the same as "lock the swap device or hold a reference"?
> > Not sure why you changed that comment to "pin".
>
> Yes it's the same thing. We don't lock the device though, the device
> can be pinned by the refcounf (get_swap_device) or locking anything
> that is referencing the device (locking PTL the a PTE that contains an
> swap entry pointing to the device, or locking a swap cache folio of a
> swap entry that points to the device). So I juse used the word "pin".
> I added some comments in mm/swap.h in later commits about what the
> "pin" means.
In that case why not reuse the previous comment keeping "lock the swap
device or hold a reference" instead of "pin"?
> > It seems to me that you want to add the comment for the return value check.
> > Is that it?
>
> Right, the caller has to check the folio before use, so I'm trying to
> document this convention.
Again, I recommend reducing the unnecessary impact to the code, make
it more obvious what you did actually change. I spend quite some time
there trying to figure out what you are trying to accomplish with the
comments.
> > > + * check the returned folio after locking it (e.g. folio_swap_contains).
> > > */
> > > struct folio *swap_cache_get_folio(swp_entry_t entry)
> > > {
> > > @@ -338,7 +340,10 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> > > for (;;) {
> > > int err;
> > >
> > > - /* Check the swap cache in case the folio is already there */
> > > + /*
> > > + * Check the swap cache first, if a cached folio is found,
> > > + * return it unlocked. The caller will lock and check it.
> > > + */
> > > folio = swap_cache_get_folio(entry);
> > > if (folio)
> > > goto got_folio;
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > index 4b8ab2cb49ca..12f2580ebe8d 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -240,12 +240,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
> > > * Offset could point to the middle of a large folio, or folio
> > > * may no longer point to the expected offset before it's locked.
> > > */
> > > - entry = folio->swap;
> > > - if (offset < swp_offset(entry) || offset >= swp_offset(entry) + nr_pages) {
> > > + if (!folio_contains_swap(folio, entry)) {
> > > folio_unlock(folio);
> > > folio_put(folio);
> > > goto again;
> > > }
> > > + entry = folio->swap;
> >
> > Can you also check this as well? The "goto again" will have entries
> > not assigned compared to previously.
> > Too late for me to think straight now if that will cause a problem.
>
> Oh, thanks for pointing this part out. This patch is correct, it's the
> original behaviour that is not correct. If the folio is no longer
> valid (the if check here failed), changing the `entry` value before
> could lead to a wrong look in the next attempt with `goto again`. That
> could lead to reclaim of an unrelated folio. It's a trivial issue
> though, only might marginally slow down the performance. Maybe I
> should make a seperate patch to fix this issue first in case anyone
> wants to backport it.
Thanks for the explanation, please do split this subtle behavior
change out with appropriate commit messages documenting your change,
why it is safe and the better behavior.
Thanks
Chris
Powered by blists - more mailing lists