linux-kernel - RE: [EXT] Re: vfio-pci: protect remap_pfn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <DM6PR18MB269865BC107C37F100743427E3939@DM6PR18MB2698.namprd18.prod.outlook.com>
Date:   Mon, 8 Mar 2021 07:03:03 +0000
From:   Bharat Bhushan <bbhushan2@...vell.com>
To:     Ankur Arora <ankur.a.arora@...cle.com>
CC:     "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Sunil Kovvuri Goutham <sgoutham@...vell.com>,
        "terminus@...il.com" <terminus@...il.com>
Subject: RE: [EXT] Re: vfio-pci: protect remap_pfn_range() from simultaneous
 calls



> -----Original Message-----
> From: Ankur Arora <ankur.a.arora@...cle.com>
> Sent: Monday, March 8, 2021 12:29 PM
> To: Bharat Bhushan <bbhushan2@...vell.com>
> Cc: alex.williamson@...hat.com; linux-kernel@...r.kernel.org; Sunil Kovvuri
> Goutham <sgoutham@...vell.com>; terminus@...il.com
> Subject: Re: [EXT] Re: vfio-pci: protect remap_pfn_range() from simultaneous
> calls
> 
> On 2021-03-02 4:47 a.m., Bharat Bhushan wrote:
> > Hi Ankur,
> >
> >> -----Original Message-----
> >> From: Ankur Arora <ankur.a.arora@...cle.com>
> >> Sent: Friday, February 26, 2021 6:24 AM
> >> To: Bharat Bhushan <bbhushan2@...vell.com>
> >> Cc: alex.williamson@...hat.com; ankur.a.arora@...cle.com; linux-
> >> kernel@...r.kernel.org; Sunil Kovvuri Goutham <sgoutham@...vell.com>;
> >> terminus@...il.com
> >> Subject: [EXT] Re: vfio-pci: protect remap_pfn_range() from
> >> simultaneous calls
> >>
> >> External Email
> >>
> >> ---------------------------------------------------------------------
> >> -
> >> Hi Bharat,
> >>
> >> Can you test the patch below to see if it works for you?
> >
> > Sorry for late reply, I actually missed this email.
> >
> > Reproducibility of the issue was low in my test scenario, one out of ~15 runs. I
> run it multiple times, overnight and observed no issues.
> 
> Awesome. Would you mind giving me your Tested-by for the patch?

Sure, please point if this is already sent for review.

> 
> >> Also could you add some more detail to your earlier description of
> >> the bug?
> >
> > Our test case is running ODP multi-threaded application, where parent process
> maps (yes it uses MAP_DMA) the region and then child processes access same.
> As a workaround we tried accessing the region once by parent process before
> creating other accessor threads and it worked as expected.
> 
> Thanks for the detail. So if the child processes start early -- they might fault while
> the VFIO_IOMMU_MAP_DMA was going on. And, since they only acquire
> mmap_lock in RO mode, both paths would end up calling io_remap_pfn_range()
> via the fault handler.

Yes, that's correct.

Thanks
-Bharat

> 
> Thanks
> Ankur
> 
> >
> > Thanks
> > -Bharat
> >
> >> In particular, AFAICS you are using ODP (-DPDK?) with multiple
> >> threads touching this region. From your stack, it looks like the
> >> fault was user-space generated, and I'm guessing you were not using
> >> the VFIO_IOMMU_MAP_DMA.
> >>
> >> Ankur
> >>
> >> -- >8 --
> >>
> >> Subject: [PATCH] vfio-pci: protect io_remap_pfn_range() from
> >> simultaneous calls
> >>
> >> vfio_pci_mmap_fault() maps the complete VMA on fault. With concurrent
> >> faults, this would result in multiple calls to io_remap_pfn_range(),
> >> where it would hit a BUG_ON(!pte_none(*pte)) in remap_pte_range().
> >> (It would also link the same VMA multiple times in vdev->vma_list but
> >> given the BUG_ON that is less serious.)
> >>
> >> Normally, however, this won't happen -- at least with
> >> vfio_iommu_type1 -- the VFIO_IOMMU_MAP_DMA path is protected by
> iommu->lock.
> >>
> >> If, however, we are using some kind of parallelization mechanism like
> >> this one with ktask under discussion [1], we would hit this.
> >> Even if we were doing this serially, given that vfio-pci remaps a
> >> larger extent than strictly necessary it should internally enforce
> >> coherence of its data structures.
> >>
> >> Handle this by using the VMA's presence in the vdev->vma_list as
> >> indicative of a fully mapped VMA and returning success early to all
> >> but the first VMA fault. Note that this is clearly optimstic given
> >> that the mapping is ongoing, and might mean that the caller sees more
> >> faults until the remap is done.
> >>
> >> [1]
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_
> >> linux-
> >> 2Dmm_20181105145141.6f9937f6-
> >>
> 40w520.home_&d=DwIDAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=PAAlWswPe7d8gHl
> >>
> GbCLmy2YezyK7O3Hv_t2heGnouBw&m=3ZDXqnn9xNUCjgXwN9mHIKT7oyXu55P
> >> U7yV2j0b-5hw&s=hiICkNtrcH4AbAWRrbkvMUylp7Bv0YHFCjxNGC6CGOk&e=
> >>
> >> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
> >> ---
> >>   drivers/vfio/pci/vfio_pci.c | 25 ++++++++++++++++++++++++-
> >>   1 file changed, 24 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci.c
> >> b/drivers/vfio/pci/vfio_pci.c index 65e7e6b44578..b9f509863db1 100644
> >> --- a/drivers/vfio/pci/vfio_pci.c
> >> +++ b/drivers/vfio/pci/vfio_pci.c
> >> @@ -1573,6 +1573,11 @@ static int __vfio_pci_add_vma(struct
> >> vfio_pci_device *vdev,
> >>   {
> >>   	struct vfio_pci_mmap_vma *mmap_vma;
> >>
> >> +	list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) {
> >> +		if (mmap_vma->vma == vma)
> >> +			return 1;
> >> +	}
> >> +
> >>   	mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL);
> >>   	if (!mmap_vma)
> >>   		return -ENOMEM;
> >> @@ -1613,6 +1618,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct
> >> vm_fault
> >> *vmf)
> >>   	struct vm_area_struct *vma = vmf->vma;
> >>   	struct vfio_pci_device *vdev = vma->vm_private_data;
> >>   	vm_fault_t ret = VM_FAULT_NOPAGE;
> >> +	int vma_present;
> >>
> >>   	mutex_lock(&vdev->vma_lock);
> >>   	down_read(&vdev->memory_lock);
> >> @@ -1623,7 +1629,21 @@ static vm_fault_t vfio_pci_mmap_fault(struct
> >> vm_fault *vmf)
> >>   		goto up_out;
> >>   	}
> >>
> >> -	if (__vfio_pci_add_vma(vdev, vma)) {
> >> +	/*
> >> +	 * __vfio_pci_add_vma() either adds the vma to the vdev->vma_list
> >> +	 * (vma_present == 0), or indicates that the vma is already present
> >> +	 * on the list (vma_present == 1).
> >> +	 *
> >> +	 * Overload the meaning of this flag to also imply that the vma is
> >> +	 * fully mapped. This allows us to serialize the mapping -- ensuring
> >> +	 * that simultaneous faults will not both try to call
> >> +	 * io_remap_pfn_range().
> >> +	 *
> >> +	 * However, this might mean that callers to which we returned success
> >> +	 * optimistically will see more faults until the remap is complete.
> >> +	 */
> >> +	vma_present = __vfio_pci_add_vma(vdev, vma);
> >> +	if (vma_present < 0) {
> >>   		ret = VM_FAULT_OOM;
> >>   		mutex_unlock(&vdev->vma_lock);
> >>   		goto up_out;
> >> @@ -1631,6 +1651,9 @@ static vm_fault_t vfio_pci_mmap_fault(struct
> >> vm_fault
> >> *vmf)
> >>
> >>   	mutex_unlock(&vdev->vma_lock);
> >>
> >> +	if (vma_present)
> >> +		goto up_out;
> >> +
> >>   	if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
> >>   			       vma->vm_end - vma->vm_start, vma-
> >>> vm_page_prot))
> >>   		ret = VM_FAULT_SIGBUS;
> >> --
> >> 2.29.2
> >