linux-kernel - Re: [PATCH 2/9] mm, swap: always lock and check the swap cache folio before use

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbUYDfne6mZ32c3YzxcwH16u-G3YXoYoGaXL5EtxjDH8mw@mail.gmail.com>
Date: Fri, 29 Aug 2025 18:42:20 -0700
From: Chris Li <chrisl@...nel.org>
To: Kairui Song <ryncsn@...il.com>
Cc: linux-mm <linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	Matthew Wilcox <willy@...radead.org>, Hugh Dickins <hughd@...gle.com>, Barry Song <baohua@...nel.org>, 
	Baoquan He <bhe@...hat.com>, Nhat Pham <nphamcs@...il.com>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>, 
	Ying Huang <ying.huang@...ux.alibaba.com>, Johannes Weiner <hannes@...xchg.org>, 
	David Hildenbrand <david@...hat.com>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>, 
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/9] mm, swap: always lock and check the swap cache folio
 before use

Hi Kairui,

Sorry for the late reply. I have been super busy this week.

On Wed, Aug 27, 2025 at 6:44 AM Kairui Song <ryncsn@...il.com> wrote:
>
> On Wed, Aug 27, 2025 at 4:06 PM Chris Li <chrisl@...nel.org> wrote:
> >
> > On Fri, Aug 22, 2025 at 12:21 PM Kairui Song <ryncsn@...il.com> wrote:
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 10ef528a5f44..9ca8e1873c6e 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -4661,12 +4661,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > >                 goto out;
> > >
> > >         folio = swap_cache_get_folio(entry);
> > > -       if (folio) {
> > > -               swap_update_readahead(folio, vma, vmf->address);
> > > -               page = folio_file_page(folio, swp_offset(entry));
> > > -       }
> > >         swapcache = folio;
> > > -
> >
> > Can simplify as:
> >            folio = swapcache = swap_cache_get_folio(entry);
>
> checkpatch.pl is unhappy about it:
>
> checkpatch: multiple assignments should be avoided

Ah, never mind then. I actually like multiple assignments but I can
see checkpatch wants to ban it.

>
> >
> > >         if (!folio) {
> > >                 if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
> > >                     __swap_count(entry) == 1) {
> > > @@ -4735,20 +4730,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > >                 ret = VM_FAULT_MAJOR;
> > >                 count_vm_event(PGMAJFAULT);
> > >                 count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
> > > -               page = folio_file_page(folio, swp_offset(entry));
> > > -       } else if (PageHWPoison(page)) {
> > > -               /*
> > > -                * hwpoisoned dirty swapcache pages are kept for killing
> > > -                * owner processes (which may be unknown at hwpoison time)
> > > -                */
> > > -               ret = VM_FAULT_HWPOISON;
> > > -               goto out_release;
> >
> > Here you move the HWPosion(page) bail out from before taking the page
> > lock to after the page lock. The HWPosion page should be able to bail
> > out without taking the lock.
> >
> >
> > >         }
> > >
> > >         ret |= folio_lock_or_retry(folio, vmf);
> > >         if (ret & VM_FAULT_RETRY)
> > >                 goto out_release;
> > >
> > > +       page = folio_file_page(folio, swp_offset(entry));
> > >         if (swapcache) {
> > >                 /*
> > >                  * Make sure folio_free_swap() or swapoff did not release the
> > > @@ -4757,10 +4745,20 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > >                  * swapcache, we need to check that the page's swap has not
> > >                  * changed.
> > >                  */
> > > -               if (unlikely(!folio_test_swapcache(folio) ||
> > > -                            page_swap_entry(page).val != entry.val))
> > > +               if (!folio_contains_swap(folio, entry))
> > >                         goto out_page;
> > >
> > > +               if (PageHWPoison(page)) {
> > > +                       /*
> > > +                        * hwpoisoned dirty swapcache pages are kept for killing
> > > +                        * owner processes (which may be unknown at hwpoison time)
> > > +                        */
> > > +                       ret = VM_FAULT_HWPOISON;
> > > +                       goto out_page;
> >
> > It seems you bail out with the page still locked, that seems like a bug to me.
>
> I think it's the original behaviour that is kind of fragile. The
> returned folio of swap_cache_get_folio is unstable unless locked, so
> the folio could have been removed from swap cache or marked by some
> other threads as Poisoned. So the PageHWPoison here could be tested
> against an unrelated page, which leads to false positive PageHWPoison
> results. We have encountered several similar bugs due to using folios
> returned by swap cache lookup without lock & checking first.
>
> So checking HWPoison after locking is actually safer.

How do I verify the code can handle HWPoison from unlock to lock page?
I haven't followed the HWPoison path very much. I am still wondering
how does the HWPoison code handle the page previously unlocked, now
become locked and without any additional code change?

If you want this behavior, I strongly suggest you split this portion
of the change out, ideally outside this 9 series if you can afford to
do so.

This patch description has nothing mentioning this kind of subtle
behavior change, as a reviewer I am caught off guard by this. At the
very least the commit should mention this. Talk explains why this
change is needed and why it is safe to do so.

Having very subtle behavior changes buried in a large code restructure
change is the worst. Splitting it out will reduce the reviewer's
workload to reason this behavior change, and makes your series easier
to review.

Does it make sense?

Chris