[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200928235739.GU9916@ziepe.ca>
Date: Mon, 28 Sep 2020 20:57:39 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Xu <peterx@...hat.com>, Leon Romanovsky <leonro@...dia.com>,
John Hubbard <jhubbard@...dia.com>,
Linux-MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Jan Kara <jack@...e.cz>, Michal Hocko <mhocko@...e.com>,
Kirill Tkhai <ktkhai@...tuozzo.com>,
Kirill Shutemov <kirill@...temov.name>,
Hugh Dickins <hughd@...gle.com>,
Christoph Hellwig <hch@....de>,
Andrea Arcangeli <aarcange@...hat.com>,
Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
On Mon, Sep 28, 2020 at 12:29:55PM -0700, Linus Torvalds wrote:
> So a read pin action would basically never work for the fast-path for
> a few cases, notably a shared read-only mapping - because we could
> never mark it in the page tables as "fast pin accessible"
Agree, I was assuming we'd loose more of the fast path to create this
thing. It would only still be fast if the pages are already writable.
I strongly suspect the case of DMA'ing actual read-only data is the
minority here, the usual case is probably filling a writable buffer
with something interesting and then triggering the DMA. The DMA just
happens to be read from the driver view so the driver doesn't set
FOLL_WRITE.
Looking at the FOLL_LONGTERM users, which should be the banner usecase
for this, there are very few that do a read pin and use fast.
> And it would basically have no advantages over a writable FOLL_PIN. It
> would break the association with any backing store for private pages,
> because otherwise it can't follow future writes.
Yes, I wasn't clear enough, I'm looking at this from a driver API
perspective. We have this API
pin_user_pages(FOLL_LONGTERM | FOLL_WRITE)
Which now has no decoherence issues with the MM. If the driver
naturally wants to do read-only access it might be tempted to do:
pin_user_pages(FOLL_LONGTERM)
Which is now NOT the same thing and brings all these really surprising
mm coherence issues back.
The driver author might discover this in testing, then be tempted to
hardwire 'FOLL_LONGTERM | FOLL_WRITE'. Now their uAPI is broken for
things that are actually read-only like .rodata.
If they discover this then they add a FOLL_FORCE to the mix.
When someone comes along to read this later it is a big leap to see
pin_user_pages(FOLL_LONGTERM | FOLL_FORCE | FOLL_WRITE)
and realize this is code for "read only mapping". At least it took me
a while to decipher it the first time I saw it.
I think this is really hard to use and ugly. My thinking has been to
just stick:
if (flags & FOLL_LONGTERM)
flags |= FOLL_FORCE | FOLL_WRITE
In pin_user_pages(). It would make the driver API cleaner. If we can
do a bit better somehow by not COW'ing for certain VMA's as you
explained then all the better, but not my primary goal..
Basically, I think if a driver is using FOLL_LONGTERM | FOLL_PIN we
should guarentee that driver a consistent MM and take the gup_fast
performance hit to do it.
AFAICT the giant wack of other cases not using FOLL_LONGTERM really
shouldn't care about read-decoherence. For those cases the user should
really not be racing write's with data under read-only pin, and the
new COW logic looks like it solves the other issues with this.
I know Jann/John have been careful to not have special behaviors for
the DMA case, but I think it makes sense here. It is actually different.
Jason
Powered by blists - more mailing lists