[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <X/jr8QfeolQwn39f@redhat.com>
Date: Fri, 8 Jan 2021 18:34:09 -0500
From: Andrea Arcangeli <aarcange@...hat.com>
To: Andy Lutomirski <luto@...nel.org>
Cc: Jason Gunthorpe <jgg@...pe.ca>, Linux-MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>, Yu Zhao <yuzhao@...gle.com>,
Peter Xu <peterx@...hat.com>,
Pavel Emelyanov <xemul@...nvz.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
Minchan Kim <minchan@...nel.org>,
Will Deacon <will@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
Kees Cook <keescook@...omium.org>,
John Hubbard <jhubbard@...dia.com>,
Leon Romanovsky <leonro@...dia.com>, Jan Kara <jack@...e.cz>,
Kirill Tkhai <ktkhai@...tuozzo.com>
Subject: Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy
On Fri, Jan 08, 2021 at 10:31:24AM -0800, Andy Lutomirski wrote:
> Can we just remove vmsplice() support? We could make it do a normal
The single case I've seen vmsplice used so far, that was really cool
is localhost live migration of qemu. However despite really cool, it
wasn't merged in the end, and I don't recall exactly why.
There are even more efficient (but slightly more complex) ways to do
that than vmsplice: using MAP_SHARED gigapages or MAP_SHARED tmpfs
with THP opted-in in the tmpfs mount, as guest physical memory instead
of anon memory and finding a way not having it cleared by kexec, so
you can also upgrade the host kernel and not just qemu... is a way
more optimal way to PIN and move all pages through the pipe and still
having to pay a superfluous copy on destination.
My guess why it's not popular, and I may be completely wrong on this
since I basically never used vmsplice (other than to proof of concept
DoS my phone to verify the long term GUP pin exploit works), is that
vmsplice is a more efficient, but not the most efficient option.
Exactly like in the live migration in place, it's always more
efficient to share a tmpfs THP backed region and have true zero copy,
than going through a pipe that still does one copy at the receiving
end. It may also be simpler and it's not dependent on F_SETPIPE_SIZE
obscure tunings. So in the end it's still too slow for apps that
requires maximum performance, and not worth the extra work for those
that don't.
I love vmsplice conceptually, just I'd rather prefer an luser cannot
run it.
> copy, thereby getting rid of a fair amount of nastiness and potential
> attacks. Even ignoring issues relating to the length of time that the
> vmsplice reference is alive, we also have whatever problems could be
> caused by a malicious or misguided user vmsplice()ing some memory and
> then modifying it.
Sorry to ask but I'm curious, what also goes wrong if the user
modifies memory under GUP pin from vmsplice? That's not obvious to
see.
Thanks,
Andrea
Powered by blists - more mailing lists