[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e0868e6-c714-1bf8-163f-389989bf5189@redhat.com>
Date: Wed, 22 Dec 2021 09:51:28 +0100
From: David Hildenbrand <david@...hat.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Nadav Amit <namit@...are.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>,
David Rientjes <rientjes@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
John Hubbard <jhubbard@...dia.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Yang Shi <shy828301@...il.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Michal Hocko <mhocko@...nel.org>,
Rik van Riel <riel@...riel.com>,
Roman Gushchin <guro@...com>,
Andrea Arcangeli <aarcange@...hat.com>,
Peter Xu <peterx@...hat.com>,
Donald Dutile <ddutile@...hat.com>,
Christoph Hellwig <hch@....de>,
Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
Linux-MM <linux-mm@...ck.org>,
"open list:KERNEL SELFTEST FRAMEWORK"
<linux-kselftest@...r.kernel.org>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via
FAULT_FLAG_UNSHARE (!hugetlb)
On 21.12.21 20:07, Jason Gunthorpe wrote:
> On Tue, Dec 21, 2021 at 06:40:30PM +0100, David Hildenbrand wrote:
>
>> 2) is certainly the cherry on top. But it just means that R/O pins don't
>> have to be the weird kid. And yes, achieving 2) would require
>> FAULT_FLAG_EXCLUSIVE / FAULT_FLAG_UNSHARED, but it would really 99% do
>> what existing COW logic does, just bypass the "map writable" and
>> "trigger write fault" semantics.
>
> I still don't agree with this - when you come to patches can you have
> this work at the end and under a good cover letter? Maybe it will make
> more sense then.
Yes. But really, I think it's the logical consequence of what Linus said
[1]:
"And then all GUP-fast would need to do is to refuse to look up a page
that isn't exclusive to that VM. We already have the situation that
GUP-fast can fail for non-writable pages etc, so it's just another
test."
We must not FOLL_PIN a page that is not exclusive (not only on gup-fast,
but really, on any gup). If we special case R/O FOLL_PIN, we cannot
enable the sanity check on unpin as suggested by Linus [2]:
"If we only set the exclusive VM bit on pages that get mapped into
user space, and we guarantee that GUP only looks up such pages, then
we can also add a debug test to the "unpin" case that the bit is
still set."
There are really only two feasible options I see when we want to take a
R/O FOLL_PIN on a !PageAnonExclusive() anon page
(1) Fail the pinning completely. This implies that we'll have to fail
O_DIRECT once converted to FOLL_PIN.
(2) Request to mark the page PageAnonExclusive() via a
FAULT_FLAG_UNSHARE and let it succeed.
Anything else would require additional accounting that we already
discussed in the past is hard -- for example, to differentiate R/O from
R/W pins requiring two pin counters.
The only impact would be that FOLL_PIN after fork() has to go via a
FAULT_FLAG_UNSHARE once, to turn the page PageAnonExclusive. IMHO this
is the right thing to do for FOLL_LONGTERM. For !FOLL_LONGTERM it would
be nice to optimize this, to *not* do that, but again ... this would
require even more counters I think, for example, to differentiate
between "R/W short/long-term or R/O long-term pin" and "R/O short-term pin".
So unless we discover a way to do additional accounting for ordinary 4k
pages, I think we really can only do (1) or (2) to make sure we never
ever pin a !PageAnonExclusive() page.
[1]
https://lkml.kernel.org/r/CAHk-=wgQq3H6wfkW7+MmduVgBOqHeiXQN97yCMd+m1mM-1xCLQ@mail.gmail.com
[2]
https://lkml.kernel.org/r/CAHk-=wiyxQ==vnHFHW99S_OPwA=u1Qrfg2OGr_6zPcBAuhQY2w@mail.gmail.com
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists