[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ece62e2-df7b-09b0-d96a-8105048c0ed0@oracle.com>
Date: Sun, 7 Mar 2021 22:59:10 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Bharat Bhushan <bbhushan2@...vell.com>
Cc: "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Sunil Kovvuri Goutham <sgoutham@...vell.com>,
"terminus@...il.com" <terminus@...il.com>
Subject: Re: [EXT] Re: vfio-pci: protect remap_pfn_range() from simultaneous
calls
On 2021-03-02 4:47 a.m., Bharat Bhushan wrote:
> Hi Ankur,
>
>> -----Original Message-----
>> From: Ankur Arora <ankur.a.arora@...cle.com>
>> Sent: Friday, February 26, 2021 6:24 AM
>> To: Bharat Bhushan <bbhushan2@...vell.com>
>> Cc: alex.williamson@...hat.com; ankur.a.arora@...cle.com; linux-
>> kernel@...r.kernel.org; Sunil Kovvuri Goutham <sgoutham@...vell.com>;
>> terminus@...il.com
>> Subject: [EXT] Re: vfio-pci: protect remap_pfn_range() from simultaneous calls
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> Hi Bharat,
>>
>> Can you test the patch below to see if it works for you?
>
> Sorry for late reply, I actually missed this email.
>
> Reproducibility of the issue was low in my test scenario, one out of ~15 runs. I run it multiple times, overnight and observed no issues.
Awesome. Would you mind giving me your Tested-by for the
patch?
>> Also could you add some more detail to your earlier description of
>> the bug?
>
> Our test case is running ODP multi-threaded application, where parent process maps (yes it uses MAP_DMA) the region and then child processes access same. As a workaround we tried accessing the region once by parent process before creating other accessor threads and it worked as expected.
Thanks for the detail. So if the child processes start early -- they
might fault while the VFIO_IOMMU_MAP_DMA was going on. And, since
they only acquire mmap_lock in RO mode, both paths would end up
calling io_remap_pfn_range() via the fault handler.
Thanks
Ankur
>
> Thanks
> -Bharat
>
>> In particular, AFAICS you are using ODP (-DPDK?) with multiple
>> threads touching this region. From your stack, it looks like the
>> fault was user-space generated, and I'm guessing you were not
>> using the VFIO_IOMMU_MAP_DMA.
>>
>> Ankur
>>
>> -- >8 --
>>
>> Subject: [PATCH] vfio-pci: protect io_remap_pfn_range() from simultaneous calls
>>
>> vfio_pci_mmap_fault() maps the complete VMA on fault. With concurrent
>> faults, this would result in multiple calls to io_remap_pfn_range(),
>> where it would hit a BUG_ON(!pte_none(*pte)) in remap_pte_range().
>> (It would also link the same VMA multiple times in vdev->vma_list
>> but given the BUG_ON that is less serious.)
>>
>> Normally, however, this won't happen -- at least with vfio_iommu_type1 --
>> the VFIO_IOMMU_MAP_DMA path is protected by iommu->lock.
>>
>> If, however, we are using some kind of parallelization mechanism like
>> this one with ktask under discussion [1], we would hit this.
>> Even if we were doing this serially, given that vfio-pci remaps a larger
>> extent than strictly necessary it should internally enforce coherence of
>> its data structures.
>>
>> Handle this by using the VMA's presence in the vdev->vma_list as
>> indicative of a fully mapped VMA and returning success early to
>> all but the first VMA fault. Note that this is clearly optimstic given
>> that the mapping is ongoing, and might mean that the caller sees
>> more faults until the remap is done.
>>
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_linux-
>> 2Dmm_20181105145141.6f9937f6-
>> 40w520.home_&d=DwIDAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=PAAlWswPe7d8gHl
>> GbCLmy2YezyK7O3Hv_t2heGnouBw&m=3ZDXqnn9xNUCjgXwN9mHIKT7oyXu55P
>> U7yV2j0b-5hw&s=hiICkNtrcH4AbAWRrbkvMUylp7Bv0YHFCjxNGC6CGOk&e=
>>
>> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
>> ---
>> drivers/vfio/pci/vfio_pci.c | 25 ++++++++++++++++++++++++-
>> 1 file changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 65e7e6b44578..b9f509863db1 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -1573,6 +1573,11 @@ static int __vfio_pci_add_vma(struct vfio_pci_device
>> *vdev,
>> {
>> struct vfio_pci_mmap_vma *mmap_vma;
>>
>> + list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) {
>> + if (mmap_vma->vma == vma)
>> + return 1;
>> + }
>> +
>> mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL);
>> if (!mmap_vma)
>> return -ENOMEM;
>> @@ -1613,6 +1618,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault
>> *vmf)
>> struct vm_area_struct *vma = vmf->vma;
>> struct vfio_pci_device *vdev = vma->vm_private_data;
>> vm_fault_t ret = VM_FAULT_NOPAGE;
>> + int vma_present;
>>
>> mutex_lock(&vdev->vma_lock);
>> down_read(&vdev->memory_lock);
>> @@ -1623,7 +1629,21 @@ static vm_fault_t vfio_pci_mmap_fault(struct
>> vm_fault *vmf)
>> goto up_out;
>> }
>>
>> - if (__vfio_pci_add_vma(vdev, vma)) {
>> + /*
>> + * __vfio_pci_add_vma() either adds the vma to the vdev->vma_list
>> + * (vma_present == 0), or indicates that the vma is already present
>> + * on the list (vma_present == 1).
>> + *
>> + * Overload the meaning of this flag to also imply that the vma is
>> + * fully mapped. This allows us to serialize the mapping -- ensuring
>> + * that simultaneous faults will not both try to call
>> + * io_remap_pfn_range().
>> + *
>> + * However, this might mean that callers to which we returned success
>> + * optimistically will see more faults until the remap is complete.
>> + */
>> + vma_present = __vfio_pci_add_vma(vdev, vma);
>> + if (vma_present < 0) {
>> ret = VM_FAULT_OOM;
>> mutex_unlock(&vdev->vma_lock);
>> goto up_out;
>> @@ -1631,6 +1651,9 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault
>> *vmf)
>>
>> mutex_unlock(&vdev->vma_lock);
>>
>> + if (vma_present)
>> + goto up_out;
>> +
>> if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
>> vma->vm_end - vma->vm_start, vma-
>>> vm_page_prot))
>> ret = VM_FAULT_SIGBUS;
>> --
>> 2.29.2
>
Powered by blists - more mailing lists