[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO9qdTECQWV6E-sor-5o8xp56knF5LpReszsG4WpFbDKTiyY1Q@mail.gmail.com>
Date: Sat, 23 Aug 2025 23:40:13 +0900
From: Jeongjun Park <aha310510@...il.com>
To: Giorgi Tchankvetadze <giorgitchankvetadze1997@...il.com>
Cc: akpm@...ux-foundation.org, david@...hat.com, leitao@...ian.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, muchun.song@...ux.dev,
osalvador@...e.de, syzbot+417aeb05fd190f3a6da9@...kaller.appspotmail.com
Subject: Re: [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()
Hello Giorgi,
Giorgi Tchankvetadze <giorgitchankvetadze1997@...il.com> wrote:
>
> + /*
> + * Check surplus_huge_pages without taking hugetlb_lock.
> + * A race here is okay:
> + * - If surplus goes 0 -> nonzero, we skip restore.
> + * - If surplus goes nonzero -> 0, we also skip.
> + * In both cases we just miss a restore, which is safe.
> + */
> + {
> + unsigned long surplus = READ_ONCE(h->surplus_huge_pages);
> +
> + if (!surplus &&
> + __vma_private_lock(vma) &&
> + folio_test_anon(folio) &&
> + READ_ONCE(h->surplus_huge_pages) == surplus) {
> + folio_set_hugetlb_restore_reserve(folio);
> + adjust_reservation = true;
> + }
> + }
>
> spin_unlock(ptl);
>
>
Why do you think skipping restoration is safe?
As specified in the comments, if scheduled restoration of anonymous pages
isn't performed in a timely manner, the backup page can be stolen.
And If the original owner tries to fault in the stolen page, it causes a
page fault, resulting in a SIGBUS error.
Of course, this phenomenon is a rare occurrence due to a race condition,
but in workloads that frequently use hugetlb, surplus_huge_pages increases
and decreases frequently, and backup pages that are not restored in time
due to this race continue to accumulate, so this is not a race that can be
ignored.
>
>
> On 8/23/2025 5:07 AM, Andrew Morton wrote:
> > On Fri, 22 Aug 2025 14:58:57 +0900 Jeongjun Park <aha310510@...il.com> wrote:
> >
> >> When restoring a reservation for an anonymous page, we need to check to > freeing a surplus. However, __unmap_hugepage_range() causes data
> > race > because it reads h->surplus_huge_pages without the protection of
> > > hugetlb_lock. > > Therefore, we need to add missing hugetlb_lock. >
> > > ... > > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5951,6 +5951,8
> > @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct
> > vm_area_struct *vma, > * If there we are freeing a surplus, do not set
> > the restore > * reservation bit. > */ > + spin_lock_irq(&hugetlb_lock);
> > > + > if (!h->surplus_huge_pages && __vma_private_lock(vma) && >
> > folio_test_anon(folio)) { > folio_set_hugetlb_restore_reserve(folio); >
> > @@ -5958,6 +5960,7 @@ void __unmap_hugepage_range(struct mmu_gather
> > *tlb, struct vm_area_struct *vma, > adjust_reservation = true; > } > > +
> > spin_unlock_irq(&hugetlb_lock); > spin_unlock(ptl); >
> > Does hugetlb_lock nest inside page_table_lock?
> >
> > It's a bit sad to be taking a global lock just to defend against some
> > alleged data race which probably never happens. Doing it once per
> > hugepage probably won't matter but still, is there something more
> > proportionate that we can do here?
> >
>
>
Regards,
Jeongjun Park
Powered by blists - more mailing lists