lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0de1a3cb-8286-15bd-aec1-2b284bf8918a@redhat.com>
Date:   Sat, 18 Dec 2021 00:29:55 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Yang Shi <shy828301@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
        Roman Gushchin <guro@...com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Donald Dutile <ddutile@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
        Linux-MM <linux-mm@...ck.org>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via
 FAULT_FLAG_UNSHARE (!hugetlb)

On 17.12.21 23:58, Linus Torvalds wrote:
> On Fri, Dec 17, 2021 at 2:29 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> While I do care about future use cases, I cannot possibly see fork() not
>> requiring the mmap_lock in the foreseeable future. Just so much depends
>> on it as of now.
> 
> It's not that *fork()* depends on it.
> 
> Of course fork() takes the mmap_sem.
> 
> It's that fast-gup really really doesn't want it, and can't take it.

Oh, sorry, I was misreading your mail.

> 
> So any fast-gup user fundamentally cannot look at mapcount(), because
> that would be fundamentally wrong and racy, and could race with fork.
> 

So we're concerned about fork() racing with gup-fast-only. gup-fast-only
runs essentially lockless.

As we read an atomic mapcount, the relevant thing to happen would be
that we read an mapcount of 1 and decide to share the page, but there is
concurrent fork() such that the mapcount is increased.

So the parent process has to be the only process owning that page for
this to trigger (mapcount == 1). In that situation, we would pin the
page in gup-fast-only.

BUT this is just like GUP before fork() and caught using the
mm->write_protect_seq, so we'd immediately unpin it and not actually
return it from get-user-pages-fast. No harm done AFAIKS.


> And yet, as far as I can tell, that's *exactly* what your gup patches
> do, with gup_pte_range() adding
> 
> +               if (!pte_write(pte) && gup_must_unshare(flags, page, false)) {
> +                       put_compound_head(head, 1, flags);
> +                       goto pte_unmap;
> +               }
> 
> which looks at the page mapcount without holding the mmap sem at all.
> 
> And see my other email - I think there are other examples of your
> patches looking at data that isn't stable because you don't hold the
> right locks.

We rely on PageAnon(), PageKsm() and the mapcount. To my understanding,
they are stable for our use in pagefault handling code under mmap_lock
and in gup-fast because of above reasoning.

> 
> And you can't even do the optimistic case without taking the lock,
> because in your world, a COW that optimistically copies in the case of
> a race condition is fundamentally *wrong* and buggy. Because in your
> world-view, GUP and COW are very different and have different rules,
> but you need things to be *exact*, and they aren't.
> 
> And none of this is anything at least I can think about, because I
> don't see what the "design" is.
> 
> I really have a hard time following what the rules actually are. You
> seem to think that "page_mapcount()" is a really simple rule, and I
> fundamentally disagree. It's a _very_ complicated thing indeed, with
> locking issues, AND YOU ACTIVELY VIOLATE THE LOCKING RULES!
> 
> See why I'm so unhappy?

I see why your unhappy, and I appreciate the productive discussion :)
But I think we just have to complete the big picture of what we're
proposing and how the mapcount is safe to be used for this purpose.

I mean, I'm happy if you actually find a flaw in the current design
proposal.

> 
> We *did* do the page_mapcount() thing. It was bad. It forced COW to
> always take the page lock. There's a very real reason why I'm pushing
> my "let's have a _design_ here", instead of your "let's look at
> page_mapcount without even doing the locking".

The locking semantics just have to be clarified and written in stone --
if we don't find any flaws.

But this will be my last mail for today, have a nice weekend Linus!

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ