lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjWMieNV3nAJgoG5prEHBEcOZiREmLUr499tA9NMttEqQ@mail.gmail.com>
Date:   Tue, 12 Jan 2021 19:31:07 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     John Hubbard <jhubbard@...dia.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Yu Zhao <yuzhao@...gle.com>, Andy Lutomirski <luto@...nel.org>,
        Peter Xu <peterx@...hat.com>,
        Pavel Emelyanov <xemul@...nvz.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Minchan Kim <minchan@...nel.org>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Hugh Dickins <hughd@...gle.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
        Kees Cook <keescook@...omium.org>,
        Leon Romanovsky <leonro@...dia.com>,
        Jason Gunthorpe <jgg@...pe.ca>, Jan Kara <jack@...e.cz>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Nadav Amit <nadav.amit@...il.com>, Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse

On Tue, Jan 12, 2021 at 6:16 PM Matthew Wilcox <willy@...radead.org> wrote:
>
> The thing about the speculative page cache references is that they can
> temporarily bump a refcount on a page which _used_ to be in the page
> cache and has now been reallocated as some other kind of page.

Oh, and thinking about this made me think we might actually have a
serious bug here, and it has nothing what-so-ever to do with COW, GUP,
or even the page count itself.

It's unlikely enough that I think it's mostly theoretical, but tell me
I'm wrong.

PLEASE tell me I'm wrong:

CPU1 does page_cache_get_speculative under RCU lock

CPU2 frees and re-uses the page

    CPU1                CPU2
    ----                ----

    page = xas_load(&xas);
    if (!page_cache_get_speculative(page))
            goto repeat;
    .. succeeds ..

                        remove page from XA
                        release page
                        reuse for something else

    .. and then re-check ..
    if (unlikely(page != xas_reload(&xas))) {
            put_page(page);
            goto repeat;
    }

ok, the above all looks fine. We got the speculative ref, but then we
noticed that its' not valid any more, so we put it again. All good,
right?

Wrong.

What if that "reuse for something else" was actually really quick, and
both allocated and released it?

That still sounds good, right? Yes, now the "put_page()" will be the
one that _actually_ releases the page, but we're still fine, right?

Very very wrong.

The "reuse for something else" on CPU2 might have gotten not an
order-0 page, but a *high-order* page. So it allocated (and then
immediately free'd) maybe an order-2 allocation with _four_ pages, and
the re-use happened when we had coalesced the buddy pages.

But when we release the page on CPU1, we will release just _one_ page,
and the other three pages will be lost forever.

IOW, we restored the page count perfectly fine, but we screwed up the
page sizes and buddy information.

Ok, so the above is so unlikely from a timing standpoint that I don't
think it ever happens, but I don't see why it couldn't happen in
theory.

Please somebody tell me I'm missing some clever thing we do to make
sure this can actually not happen..

         Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ