linux-kernel - Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHbLzkpa5MQBtYcRPWu4vNDn=Q8SKStQ-9wKYWogqRrMR3Aonw@mail.gmail.com>
Date:   Wed, 16 Jun 2021 11:40:50 -0700
From:   Yang Shi <shy828301@...il.com>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     Jann Horn <jannh@...gle.com>, John Hubbard <jhubbard@...dia.com>,
        Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Jan Kara <jack@...e.cz>, stable <stable@...r.kernel.org>
Subject: Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page()

On Wed, Jun 16, 2021 at 10:27 AM Vlastimil Babka <vbabka@...e.cz> wrote:
>
> On 6/16/21 1:10 AM, Yang Shi wrote:
> > On Tue, Jun 15, 2021 at 5:10 AM Jann Horn <jannh@...gle.com> wrote:
> >>
> >> On Tue, Jun 15, 2021 at 8:37 AM John Hubbard <jhubbard@...dia.com> wrote:
> >> > On 6/14/21 6:20 PM, Jann Horn wrote:
> >> > > try_grab_compound_head() is used to grab a reference to a page from
> >> > > get_user_pages_fast(), which is only protected against concurrent
> >> > > freeing of page tables (via local_irq_save()), but not against
> >> > > concurrent TLB flushes, freeing of data pages, or splitting of compound
> >> > > pages.
> >> [...]
> >> > Reviewed-by: John Hubbard <jhubbard@...dia.com>
> >>
> >> Thanks!
> >>
> >> [...]
> >> > > @@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
> >> > >       if (WARN_ON_ONCE(page_ref_count(head) < 0))
> >> > >               return NULL;
> >> > >       if (unlikely(!page_cache_add_speculative(head, refs)))
> >> > >               return NULL;
> >> > > +
> >> > > +     /*
> >> > > +      * At this point we have a stable reference to the head page; but it
> >> > > +      * could be that between the compound_head() lookup and the refcount
> >> > > +      * increment, the compound page was split, in which case we'd end up
> >> > > +      * holding a reference on a page that has nothing to do with the page
> >> > > +      * we were given anymore.
> >> > > +      * So now that the head page is stable, recheck that the pages still
> >> > > +      * belong together.
> >> > > +      */
> >> > > +     if (unlikely(compound_head(page) != head)) {
> >> >
> >> > I was just wondering about what all could happen here. Such as: page gets split,
> >> > reallocated into a different-sized compound page, one that still has page pointing
> >> > to head. I think that's OK, because we don't look at or change other huge page
> >> > fields.
> >> >
> >> > But I thought I'd mention the idea in case anyone else has any clever ideas about
> >> > how this simple check might be insufficient here. It seems fine to me, but I
> >> > routinely lack enough imagination about concurrent operations. :)
> >>
> >> Hmmm... I think the scariest aspect here is probably the interaction
> >> with concurrent allocation of a compound page on architectures with
> >> store-store reordering (like ARM). *If* the page allocator handled
> >> compound pages with lockless, non-atomic percpu freelists, I think it
> >> might be possible that the zeroing of tail_page->compound_head in
> >> put_page() could be reordered after the page has been freed,
> >> reallocated and set to refcount 1 again?
> >>
> >> That shouldn't be possible at the moment, but it is still a bit scary.
> >
> > It might be possible after Mel's "mm/page_alloc: Allow high-order
> > pages to be stored on the per-cpu lists" patch
> > (https://patchwork.kernel.org/project/linux-mm/patch/20210611135753.GC30378@techsingularity.net/).
>
> Those would be percpu indeed, but not "lockless, non-atomic", no? They are
> protected by a local_lock.

The local_lock is *not* a lock on non-PREEMPT_RT kernel IIUC. It
disables preempt and IRQ. But preempt disable is no-op on non-preempt
kernel. IRQ disable can guarantee it is atomic context, but I'm not
sure if it is equivalent to "atomic freelists" in Jann's context.

>
> >>
> >>
> >> I think the lockless page cache code also has to deal with somewhat
> >> similar ordering concerns when it uses page_cache_get_speculative(),
> >> e.g. in mapping_get_entry() - first it looks up a page pointer with
> >> xas_load(), and any access to the page later on would be a _dependent
> >> load_, but if the page then gets freed, reallocated, and inserted into
> >> the page cache again before the refcount increment and the re-check
> >> using xas_reload(), then there would be no data dependency from
> >> xas_reload() to the following use of the page...
> >>
> >
>