linux-kernel - Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0aa27d7d-0db6-94ee-ca16-91d19997286b@redhat.com>
Date:   Fri, 17 Dec 2021 23:29:44 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Yang Shi <shy828301@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
        Roman Gushchin <guro@...com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Donald Dutile <ddutile@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
        Linux-MM <linux-mm@...ck.org>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via
 FAULT_FLAG_UNSHARE (!hugetlb)

On 17.12.21 22:50, Linus Torvalds wrote:
> On Fri, Dec 17, 2021 at 1:47 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> For now I have not heard a compelling argument why the mapcount is
>> dubious, I repeat:
>>
>> * mapcount can only increase due to fork()
>> * mapcount can decrease due to unmap / zap
>>
>> We can protect from the transtition == 1 -> >1 using the mmap_lock.
>>
>> For COW the mapcount is the only thing that matters *if we take GUP* out
>> of the equation. And that's exactly what we
> 
> What do you have against just doing what we already do in other parts,
> that a/b thing?

Let me put it that way: I just want to get all of this fixed for good. I
don't particularly care how. *But* I will fight for something that is
superior, logically makes sense (at least to me :) ) and not super
complicated. And I call also avoiding unnecessary COW "superior".

I do know that what this series proposes fixes the CVE: GUP after fork.
I do know that the part 2 we'll be sending out next year will fix
everything else we discovered so far, and it will rely on this as a
basis, to not reintroduce any other COW issues we've seen so far.

If someone can propose something comparable that makes all discovered
problems go away I'll be *extremely* happy. We have reproducers for all
issues, so it's easy to verify, and I'm planning on extending the
selftests to cover even more corner cases.

So far, I am not convinced that using the mapcount is dubious or
problematic, I just don't see how. COW is an about sharing pages between
processes, each expressed in the mapcount. It's a pure optimization for
exactly that purpose.

GUP is the problem, not COW, not the mapcount. To me the mapcount is the
only thing that makes sense in COW+unsharing logic, and GUP has to be
taught to identify it and resolve it -> unshare when it detects a shared
anaonymous page.

> 
> Which avoids the whole mmap_sem issue. That was a big issue for the
> rdma people, afaik.

While I do care about future use cases, I cannot possibly see fork() not
requiring the mmap_lock in the foreseeable future. Just so much depends
on it as of now.

And after all, fixing everything what we discovered so far is more
important to me than something like that for the future. We have other
problems to solve in that regard.

----------------------------------------------------------------------

I didn't want to talk about hugetlb here but I will just because it's a
good example why using the refocunt is just wrong -- because unnecessary
COW are just absolutely problematic.

Assume you have a R/O mapped huge page that is only mapped into your
process and you get a write fault. What should your COW logic do?

a) Rely on the mapcount? Yes, iff GUP has been taught to unshare
   properly, because then it expresses exactly what we want to know.
   mapcount == 1 -> reuse. mapcount > 1 -> COW.

b) Rely on the refocunt? If we have a speculative refrence on the page
   we would COW. As huge pages are a scarce resource we can easily just
   not have a free huge page anymore and crash the application. The app
   didn't do anything wrong.

So teaching the hugetlb COW code to rely on the refount would just be
highly fragile.

-- 
Thanks,

David / dhildenb