[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <02cf4dcf-74e8-9cbd-ffbf-8888f18a9e8a@redhat.com>
Date: Fri, 17 Dec 2021 22:47:20 +0100
From: David Hildenbrand <david@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>,
David Rientjes <rientjes@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
John Hubbard <jhubbard@...dia.com>,
Jason Gunthorpe <jgg@...dia.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Yang Shi <shy828301@...il.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Michal Hocko <mhocko@...nel.org>,
Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
Roman Gushchin <guro@...com>,
Andrea Arcangeli <aarcange@...hat.com>,
Peter Xu <peterx@...hat.com>,
Donald Dutile <ddutile@...hat.com>,
Christoph Hellwig <hch@....de>,
Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
Linux-MM <linux-mm@...ck.org>,
"open list:KERNEL SELFTEST FRAMEWORK"
<linux-kselftest@...r.kernel.org>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via
FAULT_FLAG_UNSHARE (!hugetlb)
On 17.12.21 22:36, Linus Torvalds wrote:
> On Fri, Dec 17, 2021 at 12:55 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> If we have a shared anonymous page we cannot have GUP references, not
>> even R/O ones. Because GUP would have unshared and copied the page,
>> resulting in a R/O mapped anonymous page.
>
> Doing a GUP on an actual shared page is wrong to begin with.
>
> You even know that, you try to use "page_mapcount() > 1" to disallow it.
GUP is incomaptible with shared anonymous pages, therefore it has to
trigger unsharing, correct.
>
> My point is that it's wrong regardless, and that "mapcount" is
> dubious, and that COW cannot - and must not - use mapcount, and that I
> think your shared case should strive to avoid it for the exact same
> reason.
For now I have not heard a compelling argument why the mapcount is
dubious, I repeat:
* mapcount can only increase due to fork()
* mapcount can decrease due to unmap / zap
We can protect from the transtition == 1 -> >1 using the mmap_lock.
For COW the mapcount is the only thing that matters *if we take GUP* out
of the equation. And that's exactly what we
OTOH, take a look which issues resulted from the page_count changes.
That's what I call dubious, sorry to say.
>
> So, what I think should happen is:
>
> (a) GUP makes sure that it only ever looks up pages that can be
> shared with this VM. This may in involve breaking COW early with any
> past fork().
Is that unsharing as we propose it?
>
> (b) it marks such pages so that any future work will not cause them
> to COW either
Right, exactly. GUP before fork does not result in a page getting shared
again.
>
> Note that (a) is not necessarily "always COW and have to allocate and
> copy new page". In particular, if the page is already writable, you
> know you already have exclusive access to it and don't need to COW.
>
> And if it isn't writable, then the other common case is "the cow has
> only one user, and it's us" - that's the "refcount == 1" case.
>
> And (b) is what we do with that page_maybe_dma_pinned() logic for
> fork(), but also for things like swap cache creation (eg see commit
> feb889fb40fa: "mm: don't put pinned pages into the swap cache").
I fully agree with b). GUP before fork is a totally different set of
problems than GUP after fork.
>
> Note that this code all already exists, and already works - even
> without getting the (very expensive) mmap_sem. So it works with
> fast-GUP and it can race with concurrent forking by another thread,
> which is why we also have that seqcount thing.
I know, I studied it intensively :)
>
> As far as I can tell, your "mapcount" logic fundamentally requires
> mmap_sem for the fork() race avoidance, for example.
Yes. Or any other more lightweight synchronization in the future. For
now this is just perfect.
>
> So this is why I don't like the mapcount games - I think they are very
> fragile, and not at all as logical as the two simple rules a/b above.
I don't really see anything fragile, really. I'm happy to learn as always.
>
> I believe you can make mapcount games _work_ - we used to have
> something like that. It was incredibly fragile, and it had its own set
> of bugs, but with enough care it's doable.
We made it work, and it was comparatively simple.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists