lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMgjq7AkUc7hKrEL8cQ7jJTeXYBq7WqM55uQ_SZNNY8vJ7+ODQ@mail.gmail.com>
Date: Wed, 27 Aug 2025 22:35:51 +0800
From: Kairui Song <ryncsn@...il.com>
To: Chris Li <chrisl@...nel.org>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>, 
	Matthew Wilcox <willy@...radead.org>, Hugh Dickins <hughd@...gle.com>, Barry Song <baohua@...nel.org>, 
	Baoquan He <bhe@...hat.com>, Nhat Pham <nphamcs@...il.com>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>, 
	Ying Huang <ying.huang@...ux.alibaba.com>, Johannes Weiner <hannes@...xchg.org>, 
	David Hildenbrand <david@...hat.com>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/9] mm, swap: always lock and check the swap cache folio
 before use

On Wed, Aug 27, 2025 at 4:21 PM Chris Li <chrisl@...nel.org> wrote:
>
> On Fri, Aug 22, 2025 at 12:21 PM Kairui Song <ryncsn@...il.com> wrote:
>
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index e9d0d2784cd5..b4d39f2a1e0a 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -2379,8 +2379,6 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
> >                         count_vm_event(PGMAJFAULT);
> >                         count_memcg_event_mm(fault_mm, PGMAJFAULT);
> >                 }
> > -       } else {
> > -               swap_update_readahead(folio, NULL, 0);
>
> Also this update readahead move to later might have a similar problem.
> All the bail out in the move will lose the readahead status update.
>
> The readahead deed is already done. Missing the status update seems
> incorrect.

Thanks for the detailed review.

The only change I wanted here is that swap readahead update should be
done after checking the folio still corresponds to the swap entry
triggering the swapin. That should have slight to none effect compared
to before considering the extremely tiny time window. We are only
following the convention more strictly.

In theory it might even help to reduce false updates: if the folio no
longer corresponds to the swap entry, we are hitting an unrelated
folio, doing a readahead update will either mislead vma readahead's
address hint, or could clean up the readahead flag of an unrelated
folio without actually using it. If the folio does get hit in the
future, due to the missing readahead flag, the statistic will go
wrong.

>
>
> >         }
> >
> >         if (order > folio_order(folio)) {
> > @@ -2431,6 +2429,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
> >                 error = -EIO;
> >                 goto failed;
> >         }
> > +       if (!skip_swapcache)
> > +               swap_update_readahead(folio, NULL, 0);
> >         folio_wait_writeback(folio);
> >         nr_pages = folio_nr_pages(folio);
>
>
> >
> > diff --git a/mm/swap.h b/mm/swap.h
> > index efb6d7ff9f30..bb2adbfd64a9 100644
> > --- a/mm/swap.h
> > +++ b/mm/swap.h
> > @@ -52,6 +52,29 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry)
> >         return swp_offset(entry) & SWAP_ADDRESS_SPACE_MASK;
> >  }
> >
> > +/**
> > + * folio_contains_swap - Does this folio contain this swap entry?
> > + * @folio: The folio.
> > + * @entry: The swap entry to check against.
> > + *
> > + * Swap version of folio_contains()
> > + *
> > + * Context: The caller should have the folio locked to ensure
> > + * nothing will move it out of the swap cache.
> > + * Return: true or false.
> > + */
> > +static inline bool folio_contains_swap(struct folio *folio, swp_entry_t entry)
> > +{
> > +       pgoff_t offset = swp_offset(entry);
> > +
> > +       VM_WARN_ON_ONCE(!folio_test_locked(folio));
> > +       if (unlikely(!folio_test_swapcache(folio)))
> > +               return false;
> > +       if (unlikely(swp_type(entry) != swp_type(folio->swap)))
> > +               return false;
> > +       return offset - swp_offset(folio->swap) < folio_nr_pages(folio);
> > +}
> > +
> >  void show_swap_cache_info(void);
> >  void *get_shadow_from_swap_cache(swp_entry_t entry);
> >  int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
> > @@ -144,6 +167,11 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry)
> >         return 0;
> >  }
> >
> > +static inline bool folio_contains_swap(struct folio *folio, swp_entry_t entry)
> > +{
> > +       return false;
> > +}
> > +
> >  static inline void show_swap_cache_info(void)
> >  {
> >  }
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index ff9eb761a103..be0d96494dc1 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -70,10 +70,12 @@ void show_swap_cache_info(void)
> >  }
> >
> >  /*
> > - * Lookup a swap entry in the swap cache. A found folio will be returned
> > - * unlocked and with its refcount incremented.
> > + * swap_cache_get_folio - Lookup a swap entry in the swap cache.
> >   *
> > - * Caller must lock the swap device or hold a reference to keep it valid.
> > + * A found folio will be returned unlocked and with its refcount increased.
> > + *
> > + * Context: Caller must ensure @entry is valid and pin the swap device, also
> Is the "pin" the same as  "lock the swap device or hold a reference"?
> Not sure why you changed that comment to "pin".

Yes it's the same thing. We don't lock the device though, the device
can be pinned by the refcounf (get_swap_device) or locking anything
that is referencing the device (locking PTL the a PTE that contains an
swap entry pointing to the device, or locking a swap cache folio of a
swap entry that points to the device). So I juse used the word "pin".
I added some comments in mm/swap.h in later commits about what the
"pin" means.

>
> It seems to me that you want to add the comment for the return value check.
> Is that it?

Right, the caller has to check the folio before use, so I'm trying to
document this convention.

> > + * check the returned folio after locking it (e.g. folio_swap_contains).
> >   */
> >  struct folio *swap_cache_get_folio(swp_entry_t entry)
> >  {
> > @@ -338,7 +340,10 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> >         for (;;) {
> >                 int err;
> >
> > -               /* Check the swap cache in case the folio is already there */
> > +               /*
> > +                * Check the swap cache first, if a cached folio is found,
> > +                * return it unlocked. The caller will lock and check it.
> > +                */
> >                 folio = swap_cache_get_folio(entry);
> >                 if (folio)
> >                         goto got_folio;
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 4b8ab2cb49ca..12f2580ebe8d 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -240,12 +240,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
> >          * Offset could point to the middle of a large folio, or folio
> >          * may no longer point to the expected offset before it's locked.
> >          */
> > -       entry = folio->swap;
> > -       if (offset < swp_offset(entry) || offset >= swp_offset(entry) + nr_pages) {
> > +       if (!folio_contains_swap(folio, entry)) {
> >                 folio_unlock(folio);
> >                 folio_put(folio);
> >                 goto again;
> >         }
> > +       entry = folio->swap;
>
> Can you also check this as well? The "goto again" will have entries
> not assigned compared to previously.
> Too late for me to think straight now if that will cause a problem.

Oh, thanks for pointing this part out. This patch is correct, it's the
original behaviour that is not correct. If the folio is no longer
valid (the if check here failed), changing the `entry` value before
could lead to a wrong look in the next attempt with `goto again`. That
could lead to reclaim of an unrelated folio. It's a trivial issue
though, only might marginally slow down the performance. Maybe I
should make a seperate patch to fix this issue first in case anyone
wants to backport it.

>
> To be continued.
>
> Chris
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ