linux-kernel - Re: [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAO9qdTECQWV6E-sor-5o8xp56knF5LpReszsG4WpFbDKTiyY1Q@mail.gmail.com>
Date: Sat, 23 Aug 2025 23:40:13 +0900
From: Jeongjun Park <aha310510@...il.com>
To: Giorgi Tchankvetadze <giorgitchankvetadze1997@...il.com>
Cc: akpm@...ux-foundation.org, david@...hat.com, leitao@...ian.org, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, muchun.song@...ux.dev, 
	osalvador@...e.de, syzbot+417aeb05fd190f3a6da9@...kaller.appspotmail.com
Subject: Re: [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()

Hello Giorgi,

Giorgi Tchankvetadze <giorgitchankvetadze1997@...il.com> wrote:
>
> +               /*
> +                * Check surplus_huge_pages without taking hugetlb_lock.
> +                * A race here is okay:
> +                *   - If surplus goes 0 -> nonzero, we skip restore.
> +                *   - If surplus goes nonzero -> 0, we also skip.
> +                * In both cases we just miss a restore, which is safe.
> +                */
> +               {
> +                       unsigned long surplus = READ_ONCE(h->surplus_huge_pages);
> +
> +                       if (!surplus &&
> +                           __vma_private_lock(vma) &&
> +                           folio_test_anon(folio) &&
> +                           READ_ONCE(h->surplus_huge_pages) == surplus) {
> +                               folio_set_hugetlb_restore_reserve(folio);
> +                               adjust_reservation = true;
> +                       }
> +               }
>
>                 spin_unlock(ptl);
>
>

Why do you think skipping restoration is safe?

As specified in the comments, if scheduled restoration of anonymous pages
isn't performed in a timely manner, the backup page can be stolen.

And If the original owner tries to fault in the stolen page, it causes a
page fault, resulting in a SIGBUS error.

Of course, this phenomenon is a rare occurrence due to a race condition,
but in workloads that frequently use hugetlb, surplus_huge_pages increases
and decreases frequently, and backup pages that are not restored in time
due to this race continue to accumulate, so this is not a race that can be
ignored.

>
>
> On 8/23/2025 5:07 AM, Andrew Morton wrote:
> > On Fri, 22 Aug 2025 14:58:57 +0900 Jeongjun Park <aha310510@...il.com> wrote:
> >
> >> When restoring a reservation for an anonymous page, we need to check to > freeing a surplus. However, __unmap_hugepage_range() causes data
> > race > because it reads h->surplus_huge_pages without the protection of
> >  > hugetlb_lock. > > Therefore, we need to add missing hugetlb_lock. >
> >  > ... > > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5951,6 +5951,8
> > @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct
> > vm_area_struct *vma, > * If there we are freeing a surplus, do not set
> > the restore > * reservation bit. > */ > + spin_lock_irq(&hugetlb_lock);
> >  > + > if (!h->surplus_huge_pages && __vma_private_lock(vma) && >
> > folio_test_anon(folio)) { > folio_set_hugetlb_restore_reserve(folio); >
> > @@ -5958,6 +5960,7 @@ void __unmap_hugepage_range(struct mmu_gather
> > *tlb, struct vm_area_struct *vma, > adjust_reservation = true; > } > > +
> > spin_unlock_irq(&hugetlb_lock); > spin_unlock(ptl); >
> > Does hugetlb_lock nest inside page_table_lock?
> >
> > It's a bit sad to be taking a global lock just to defend against some
> > alleged data race which probably never happens.  Doing it once per
> > hugepage probably won't matter but still, is there something more
> > proportionate that we can do here?
> >
>
>

Regards,
Jeongjun Park