linux-kernel - Re: [PATCH RFC 09/10] mm/hugetlb: Make hugetlb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y2PgmMs5q5jOEN0K@x1n>
Date:   Thu, 3 Nov 2022 11:39:04 -0400
From:   Peter Xu <peterx@...hat.com>
To:     James Houghton <jthoughton@...gle.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Mike Kravetz <mike.kravetz@...cle.com>,
        David Hildenbrand <david@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Rik van Riel <riel@...riel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Muchun Song <songmuchun@...edance.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Nadav Amit <nadav.amit@...il.com>
Subject: Re: [PATCH RFC 09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe

On Wed, Nov 02, 2022 at 11:04:01AM -0700, James Houghton wrote:
> On Sun, Oct 30, 2022 at 2:30 PM Peter Xu <peterx@...hat.com> wrote:
> >
> > RCU makes sure the pte_t* won't go away from under us.  Please refer to the
> > comment above huge_pte_offset() for more information.
> 
> Thanks for this series, Peter! :)

Thanks for reviewing, James!

> 
> >
> > Signed-off-by: Peter Xu <peterx@...hat.com>
> > ---
> >  mm/hugetlb.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 5dc87e4e6780..6d336d286394 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -5822,6 +5822,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >         int need_wait_lock = 0;
> >         unsigned long haddr = address & huge_page_mask(h);
> >
> > +       /* For huge_pte_offset() */
> > +       rcu_read_lock();
> >         ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
> >         if (ptep) {
> >                 /*
> > @@ -5830,13 +5832,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >                  * not actually modifying content here.
> >                  */
> >                 entry = huge_ptep_get(ptep);
> > +               rcu_read_unlock();
> >                 if (unlikely(is_hugetlb_entry_migration(entry))) {
> >                         migration_entry_wait_huge(vma, ptep);
> 
> ptep is used here (and we dereference it in
> `__migration_entry_wait_huge`), so this looks unsafe to me. A simple
> way to fix this would be to move the migration entry check after the
> huge_pte_alloc call.

Right, I definitely overlooked the migration entries in both patches
(including the previous one that you commented), thanks for pointing that
out.

Though moving that after huge_pte_alloc() may have similar problem, iiuc.
The thing is we need either the vma lock or rcu to protect accessing the
pte*, while the pte* page and its pgtable lock can be accessed very deep
into the migration core (e.g., migration_entry_wait_on_locked()) as the
lock cannot be released before the thread queues itself into the waitqueue.

So far I don't see a good way to achieve this but add a hook to
migration_entry_wait_on_locked() so that any lock held for huge migrations
can be properly released after the pgtable lock released but before the
thread yields itself.

-- 
Peter Xu