[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM6PR18MB2698B4CB0114D7D1FCE74B4BE3999@DM6PR18MB2698.namprd18.prod.outlook.com>
Date: Tue, 2 Mar 2021 12:47:53 +0000
From: Bharat Bhushan <bbhushan2@...vell.com>
To: Ankur Arora <ankur.a.arora@...cle.com>
CC: "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Sunil Kovvuri Goutham <sgoutham@...vell.com>,
"terminus@...il.com" <terminus@...il.com>
Subject: RE: [EXT] Re: vfio-pci: protect remap_pfn_range() from simultaneous
calls
Hi Ankur,
> -----Original Message-----
> From: Ankur Arora <ankur.a.arora@...cle.com>
> Sent: Friday, February 26, 2021 6:24 AM
> To: Bharat Bhushan <bbhushan2@...vell.com>
> Cc: alex.williamson@...hat.com; ankur.a.arora@...cle.com; linux-
> kernel@...r.kernel.org; Sunil Kovvuri Goutham <sgoutham@...vell.com>;
> terminus@...il.com
> Subject: [EXT] Re: vfio-pci: protect remap_pfn_range() from simultaneous calls
>
> External Email
>
> ----------------------------------------------------------------------
> Hi Bharat,
>
> Can you test the patch below to see if it works for you?
Sorry for late reply, I actually missed this email.
Reproducibility of the issue was low in my test scenario, one out of ~15 runs. I run it multiple times, overnight and observed no issues.
>
> Also could you add some more detail to your earlier description of
> the bug?
Our test case is running ODP multi-threaded application, where parent process maps (yes it uses MAP_DMA) the region and then child processes access same. As a workaround we tried accessing the region once by parent process before creating other accessor threads and it worked as expected.
Thanks
-Bharat
> In particular, AFAICS you are using ODP (-DPDK?) with multiple
> threads touching this region. From your stack, it looks like the
> fault was user-space generated, and I'm guessing you were not
> using the VFIO_IOMMU_MAP_DMA.
>
> Ankur
>
> -- >8 --
>
> Subject: [PATCH] vfio-pci: protect io_remap_pfn_range() from simultaneous calls
>
> vfio_pci_mmap_fault() maps the complete VMA on fault. With concurrent
> faults, this would result in multiple calls to io_remap_pfn_range(),
> where it would hit a BUG_ON(!pte_none(*pte)) in remap_pte_range().
> (It would also link the same VMA multiple times in vdev->vma_list
> but given the BUG_ON that is less serious.)
>
> Normally, however, this won't happen -- at least with vfio_iommu_type1 --
> the VFIO_IOMMU_MAP_DMA path is protected by iommu->lock.
>
> If, however, we are using some kind of parallelization mechanism like
> this one with ktask under discussion [1], we would hit this.
> Even if we were doing this serially, given that vfio-pci remaps a larger
> extent than strictly necessary it should internally enforce coherence of
> its data structures.
>
> Handle this by using the VMA's presence in the vdev->vma_list as
> indicative of a fully mapped VMA and returning success early to
> all but the first VMA fault. Note that this is clearly optimstic given
> that the mapping is ongoing, and might mean that the caller sees
> more faults until the remap is done.
>
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_linux-
> 2Dmm_20181105145141.6f9937f6-
> 40w520.home_&d=DwIDAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=PAAlWswPe7d8gHl
> GbCLmy2YezyK7O3Hv_t2heGnouBw&m=3ZDXqnn9xNUCjgXwN9mHIKT7oyXu55P
> U7yV2j0b-5hw&s=hiICkNtrcH4AbAWRrbkvMUylp7Bv0YHFCjxNGC6CGOk&e=
>
> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
> ---
> drivers/vfio/pci/vfio_pci.c | 25 ++++++++++++++++++++++++-
> 1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 65e7e6b44578..b9f509863db1 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1573,6 +1573,11 @@ static int __vfio_pci_add_vma(struct vfio_pci_device
> *vdev,
> {
> struct vfio_pci_mmap_vma *mmap_vma;
>
> + list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) {
> + if (mmap_vma->vma == vma)
> + return 1;
> + }
> +
> mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL);
> if (!mmap_vma)
> return -ENOMEM;
> @@ -1613,6 +1618,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault
> *vmf)
> struct vm_area_struct *vma = vmf->vma;
> struct vfio_pci_device *vdev = vma->vm_private_data;
> vm_fault_t ret = VM_FAULT_NOPAGE;
> + int vma_present;
>
> mutex_lock(&vdev->vma_lock);
> down_read(&vdev->memory_lock);
> @@ -1623,7 +1629,21 @@ static vm_fault_t vfio_pci_mmap_fault(struct
> vm_fault *vmf)
> goto up_out;
> }
>
> - if (__vfio_pci_add_vma(vdev, vma)) {
> + /*
> + * __vfio_pci_add_vma() either adds the vma to the vdev->vma_list
> + * (vma_present == 0), or indicates that the vma is already present
> + * on the list (vma_present == 1).
> + *
> + * Overload the meaning of this flag to also imply that the vma is
> + * fully mapped. This allows us to serialize the mapping -- ensuring
> + * that simultaneous faults will not both try to call
> + * io_remap_pfn_range().
> + *
> + * However, this might mean that callers to which we returned success
> + * optimistically will see more faults until the remap is complete.
> + */
> + vma_present = __vfio_pci_add_vma(vdev, vma);
> + if (vma_present < 0) {
> ret = VM_FAULT_OOM;
> mutex_unlock(&vdev->vma_lock);
> goto up_out;
> @@ -1631,6 +1651,9 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault
> *vmf)
>
> mutex_unlock(&vdev->vma_lock);
>
> + if (vma_present)
> + goto up_out;
> +
> if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
> vma->vm_end - vma->vm_start, vma-
> >vm_page_prot))
> ret = VM_FAULT_SIGBUS;
> --
> 2.29.2
Powered by blists - more mailing lists