[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230508145715.630fe3ae.alex.williamson@redhat.com>
Date: Mon, 8 May 2023 14:57:15 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Yan Zhao <yan.y.zhao@...el.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, kevin.tian@...el.com,
yishaih@...dia.com, shameerali.kolothum.thodi@...wei.com,
Cédric Le Goater <clg@...hat.com>
Subject: Re: [PATCH] vfio/pci: take mmap write lock for io_remap_pfn_range
On Mon, 8 May 2023 13:48:30 -0300
Jason Gunthorpe <jgg@...dia.com> wrote:
> On Mon, May 08, 2023 at 08:58:42PM +0800, Yan Zhao wrote:
> > In VFIO type1, vaddr_get_pfns() will try fault in MMIO PFNs after
> > pin_user_pages_remote() returns -EFAULT.
> >
> > follow_fault_pfn
> > fixup_user_fault
> > handle_mm_fault
> > handle_mm_fault
> > do_fault
> > do_shared_fault
> > do_fault
> > __do_fault
> > vfio_pci_mmap_fault
> > io_remap_pfn_range
> > remap_pfn_range
> > track_pfn_remap
> > vm_flags_set ==> mmap_assert_write_locked(vma->vm_mm)
> > remap_pfn_range_notrack
> > vm_flags_set ==> mmap_assert_write_locked(vma->vm_mm)
> >
> > As io_remap_pfn_range() will call vm_flags_set() to update vm_flags [1],
> > holding of mmap write lock is required.
> > So, update vfio_pci_mmap_fault() to drop mmap read lock and take mmap
> > write lock.
> >
> > [1] https://lkml.kernel.org/r/20230126193752.297968-3-surenb@google.com
> > commit bc292ab00f6c ("mm: introduce vma->vm_flags wrapper functions")
> > commit 1c71222e5f23
> > ("mm: replace vma->vm_flags direct modifications with modifier calls")
> >
> > Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> > ---
> > drivers/vfio/pci/vfio_pci_core.c | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index a5ab416cf476..5082f89152b3 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -1687,6 +1687,12 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
> > struct vfio_pci_mmap_vma *mmap_vma;
> > vm_fault_t ret = VM_FAULT_NOPAGE;
> >
> > + mmap_assert_locked(vma->vm_mm);
> > + mmap_read_unlock(vma->vm_mm);
> > +
> > + if (mmap_write_lock_killable(vma->vm_mm))
> > + return VM_FAULT_RETRY;
>
> Certainly not..
>
> I'm not sure how to resolve this properly, set the flags in advance?
>
> The address space conversion?
We already try to set the flags in advance, but there are some
architectural flags like VM_PAT that make that tricky. Cedric has been
looking at inserting individual pages with vmf_insert_pfn(), but that
incurs a lot more faults and therefore latency vs remapping the entire
vma on fault. I'm not convinced that we shouldn't just attempt to
remove the fault handler entirely, but I haven't tried it yet to know
what gotchas are down that path. Thanks,
Alex
Powered by blists - more mailing lists