[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <X/pzhKj2m48brSdN@redhat.com>
Date: Sat, 9 Jan 2021 22:24:52 -0500
From: Andrea Arcangeli <aarcange@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Linux-MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Yu Zhao <yuzhao@...gle.com>, Andy Lutomirski <luto@...nel.org>,
Peter Xu <peterx@...hat.com>,
Pavel Emelyanov <xemul@...nvz.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
Minchan Kim <minchan@...nel.org>,
Will Deacon <will@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Hugh Dickins <hughd@...gle.com>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Matthew Wilcox <willy@...radead.org>,
Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
Kees Cook <keescook@...omium.org>,
John Hubbard <jhubbard@...dia.com>,
Leon Romanovsky <leonro@...dia.com>,
Jason Gunthorpe <jgg@...pe.ca>, Jan Kara <jack@...e.cz>,
Kirill Tkhai <ktkhai@...tuozzo.com>,
Nadav Amit <nadav.amit@...il.com>, Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse
On Sat, Jan 09, 2021 at 05:37:09PM -0800, Linus Torvalds wrote:
> On Sat, Jan 9, 2021 at 5:19 PM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > And no, I didn't make the UFFDIO_WRITEPROTECT code take the mmap_sem
> > for writing. For whoever wants to look at that, it's
> > mwriteprotect_range() in mm/userfaultfd.c and the fix is literally to
> > turn the read-lock (and unlock) into a write-lock (and unlock).
>
> Oh, and if it wasn't obvious, we'll have to debate what to do with
> trying to mprotect() a pinned page. Do we just ignore the pinned page
> (the way my clear_refs patch did)? Or do we turn it into -EBUSY? Or
> what?
Agreed, I assume mprotect would have the same effect.
mprotect in parallel of a read or recvmgs may be undefined, so I
didn't bring it up, but it was pretty clear.
The moment the write bit is cleared (no matter why and from who) and
the PG lock relased, if there's any GUP pin, GUP currently loses
synchrony.
In any case I intended to help exercising the new page_count logic
with the testcase, possibly to make it behave better somehow, no
matter how.
I admit I'm also wondering myself the exact semantics of O_DIRECT on
clear_refs or uffd-wp tracking, but the point is that losing reads and
getting unexpected data in the page, still doesn't look a good
behavior and it had to be at least checked.
To me ultimately the useful use case that is become impossible with
page_count isn't even clear_refs nor uffd-wp.
The useful case that I can see zero fundamental flaws in it, is a RDMA
or some other device computing in pure readonly DMA on the data while
a program runs normally and produces it. It could be even a
framebuffer that doesn't care about coherency. You may want to
occasionally wrprotect the memory under readonly long term GUP pin for
consistency even against bugs of the program itself. Why should
wrprotecting make the device lose synchrony? And kind of performance
we gain to the normal useful cases by breaking the special case? Is
there a benchmark showing it?
> So it's not *just* the locking that needs to be fixed. But just take a
> look at that suggested clear_refs patch of mine - it sure isn't
> complicated.
If we can skip the wrprotection it's fairly easy, I fully agree, even
then it still looks more difficult than using page_mapcount in
do_wp_page in my view, so I also don't see the simplification. And
overall the amount of kernel code had a net increase as result.
Thanks,
Andrea
Powered by blists - more mailing lists