lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZFEtKe/XcnC++ACZ@x1n>
Date:   Tue, 2 May 2023 11:32:57 -0400
From:   Peter Xu <peterx@...hat.com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     Matthew Rosato <mjrosato@...ux.ibm.com>,
        David Hildenbrand <david@...hat.com>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Lorenzo Stoakes <lstoakes@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jens Axboe <axboe@...nel.dk>,
        Matthew Wilcox <willy@...radead.org>,
        Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>,
        Leon Romanovsky <leon@...nel.org>,
        Christian Benvenuti <benve@...co.com>,
        Nelson Escobar <neescoba@...co.com>,
        Bernard Metzler <bmt@...ich.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Bjorn Topel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Christian Brauner <brauner@...nel.org>,
        Richard Cochran <richardcochran@...il.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        linux-fsdevel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        netdev@...r.kernel.org, bpf@...r.kernel.org,
        Oleg Nesterov <oleg@...hat.com>,
        John Hubbard <jhubbard@...dia.com>, Jan Kara <jack@...e.cz>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Pavel Begunkov <asml.silence@...il.com>,
        Mika Penttila <mpenttil@...hat.com>,
        Dave Chinner <david@...morbit.com>,
        Theodore Ts'o <tytso@....edu>
Subject: Re: [PATCH v6 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing
 to file-backed mappings

On Tue, May 02, 2023 at 12:20:46PM -0300, Jason Gunthorpe wrote:
> On Tue, May 02, 2023 at 10:54:35AM -0400, Matthew Rosato wrote:
> > On 5/2/23 10:15 AM, David Hildenbrand wrote:
> > > On 02.05.23 16:04, Jason Gunthorpe wrote:
> > >> On Tue, May 02, 2023 at 03:57:30PM +0200, David Hildenbrand wrote:
> > >>> On 02.05.23 15:50, Jason Gunthorpe wrote:
> > >>>> On Tue, May 02, 2023 at 03:47:43PM +0200, David Hildenbrand wrote:
> > >>>>>> Eventually we want to implement a mechanism where we can dynamically pin in response to RPCIT.
> > >>>>>
> > >>>>> Okay, so IIRC we'll fail starting the domain early, that's good. And if we
> > >>>>> pin all guest memory (instead of small pieces dynamically), there is little
> > >>>>> existing use for file-backed RAM in such zPCI configurations (because memory
> > >>>>> cannot be reclaimed either way if it's all pinned), so likely there are no
> > >>>>> real existing users.
> > >>>>
> > >>>> Right, this is VFIO, the physical HW can't tolerate not having pinned
> > >>>> memory, so something somewhere is always pinning it.
> > >>>>
> > >>>> Which, again, makes it weird/wrong that this KVM code is pinning it
> > >>>> again :\
> > >>>
> > >>> IIUC, that pinning is not for ordinary IOMMU / KVM memory access. It's for
> > >>> passthrough of (adapter) interrupts.
> > >>>
> > >>> I have to speculate, but I guess for hardware to forward interrupts to the
> > >>> VM, it has to pin the special guest memory page that will receive the
> > >>> indications, to then configure (interrupt) hardware to target the interrupt
> > >>> indications to that special guest page (using a host physical address).
> > >>
> > >> Either the emulated access is "CPU" based happening through the KVM
> > >> page table so it should use mmu_notifier locking.
> > >>
> > >> Or it is "DMA" and should go through an IOVA through iommufd pinning
> > >> and locking.
> > >>
> > >> There is no other ground, nothing in KVM should be inventing its own
> > >> access methodology.
> > > 
> > > I might be wrong, but this seems to be a bit different.
> > >
> > > It cannot tolerate page faults (needs a host physical address), so
> > > memory notifiers don't really apply. (as a side note, KVM on s390x
> > > does not use mmu notifiers as we know them)
> >
> > The host physical address is one shared between underlying firmware
> > and the host kvm.  Either might make changes to the referenced page
> > and then issue an alert to the guest via a mechanism called GISA,
> > giving impetus to the guest to look at that page and process the
> > event.  As you say, firmware can't tolerate the page being
> > unavailable; it's expecting that once we feed it that location it's
> > always available until we remove it (kvm_s390_pci_aif_disable).
> 
> That is a CPU access delegated to the FW without any locking scheme to
> make it safe with KVM :\
> 
> It would have been better if FW could inject it through the kvm page
> tables so it has some coherency.
> 
> Otherwise you have to call this "DMA", I think.
> 
> How does s390 avoid mmu notifiers without having lots of problems?? It
> is not really optional to hook the invalidations if you need to build
> a shadow page table..

Totally no idea on s390 details, but.. per my read above, if the firmware
needs to make sure the page is always available (so no way to fault it in
on demand), which means a longterm pinning seems appropriate here.

Then if pinned a must, there's no need for mmu notifiers (as the page will
simply not be invalidated anyway)?

Thanks,

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ