lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <02cf4dcf-74e8-9cbd-ffbf-8888f18a9e8a@redhat.com>
Date:   Fri, 17 Dec 2021 22:47:20 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Yang Shi <shy828301@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
        Roman Gushchin <guro@...com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Donald Dutile <ddutile@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
        Linux-MM <linux-mm@...ck.org>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via
 FAULT_FLAG_UNSHARE (!hugetlb)

On 17.12.21 22:36, Linus Torvalds wrote:
> On Fri, Dec 17, 2021 at 12:55 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> If we have a shared anonymous page we cannot have GUP references, not
>> even R/O ones. Because GUP would have unshared and copied the page,
>> resulting in a R/O mapped anonymous page.
> 
> Doing a GUP on an actual shared page is wrong to begin with.
> 
> You even know that, you try to use "page_mapcount() > 1" to disallow it.

GUP is incomaptible with shared anonymous pages, therefore it has to
trigger unsharing, correct.

> 
> My point is that it's wrong regardless, and that "mapcount" is
> dubious, and that COW cannot - and must not - use mapcount, and that I
> think your shared case should strive to avoid it for the exact same
> reason.

For now I have not heard a compelling argument why the mapcount is
dubious, I repeat:

* mapcount can only increase due to fork()
* mapcount can decrease due to unmap / zap

We can protect from the transtition == 1 -> >1 using the mmap_lock.

For COW the mapcount is the only thing that matters *if we take GUP* out
of the equation. And that's exactly what we

OTOH, take a look which issues resulted from the page_count changes.
That's what I call dubious, sorry to say.

> 
> So, what I think should happen is:
> 
>  (a) GUP makes sure that it only ever looks up pages that can be
> shared with this VM. This may in involve breaking COW early with any
> past fork().

Is that unsharing as we propose it?

> 
>  (b) it marks such pages so that any future work will not cause them
> to COW either

Right, exactly. GUP before fork does not result in a page getting shared
again.

> 
> Note that (a) is not necessarily "always COW and have to allocate and
> copy new page". In particular, if the page is already writable, you
> know you already have exclusive access to it and don't need to COW.
> 
> And if it isn't writable, then the other common case is "the cow has
> only one user, and it's us" - that's the "refcount == 1" case.
> 
> And (b) is what we do with that page_maybe_dma_pinned() logic for
> fork(), but also for things like swap cache creation (eg see commit
> feb889fb40fa: "mm: don't put pinned pages into the swap cache").

I fully agree with b). GUP before fork is a totally different set of
problems than GUP after fork.

> 
> Note that this code all already exists, and already works - even
> without getting the (very expensive) mmap_sem. So it works with
> fast-GUP and it can race with concurrent forking by another thread,
> which is why we also have that seqcount thing.

I know, I studied it intensively :)

> 
> As far as I can tell, your "mapcount" logic fundamentally requires
> mmap_sem for the fork() race avoidance, for example.

Yes. Or any other more lightweight synchronization in the future. For
now this is just perfect.

> 
> So this is why I don't like the mapcount games - I think they are very
> fragile, and not at all as logical as the two simple rules a/b above.

I don't really see anything fragile, really. I'm happy to learn as always.

> 
> I believe you can make mapcount games _work_ - we used to have
> something like that. It was incredibly fragile, and it had its own set
> of bugs, but with enough care it's doable.

We made it work, and it was comparatively simple.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ