linux-kernel - Re: [RFC PATCH v4 06/13] vfio: parallelize vfio_pin_map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181106024228.sxkn3s22mfkf7lcc@ca-dmjordan1.us.oracle.com>
Date:   Mon, 5 Nov 2018 18:42:28 -0800
From:   Daniel Jordan <daniel.m.jordan@...cle.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     Daniel Jordan <daniel.m.jordan@...cle.com>, linux-mm@...ck.org,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        aarcange@...hat.com, aaron.lu@...el.com, akpm@...ux-foundation.org,
        bsd@...hat.com, darrick.wong@...cle.com,
        dave.hansen@...ux.intel.com, jgg@...lanox.com, jwadams@...gle.com,
        jiangshanlai@...il.com, mhocko@...nel.org, mike.kravetz@...cle.com,
        Pavel.Tatashin@...rosoft.com, prasad.singamsetty@...cle.com,
        rdunlap@...radead.org, steven.sistare@...cle.com,
        tim.c.chen@...el.com, tj@...nel.org, vbabka@...e.cz
Subject: Re: [RFC PATCH v4 06/13] vfio: parallelize vfio_pin_map_dma

On Mon, Nov 05, 2018 at 02:51:41PM -0700, Alex Williamson wrote:
> On Mon,  5 Nov 2018 11:55:51 -0500
> Daniel Jordan <daniel.m.jordan@...cle.com> wrote:
> > +static int vfio_pin_map_dma_chunk(unsigned long start_vaddr,
> > +				  unsigned long end_vaddr,
> > +				  struct vfio_pin_args *args)
> >  {
> > -	dma_addr_t iova = dma->iova;
> > -	unsigned long vaddr = dma->vaddr;
> > -	size_t size = map_size;
> > +	struct vfio_dma *dma = args->dma;
> > +	dma_addr_t iova = dma->iova + (start_vaddr - dma->vaddr);
> > +	unsigned long unmapped_size = end_vaddr - start_vaddr;
> > +	unsigned long pfn, mapped_size = 0;
> >  	long npage;
> > -	unsigned long pfn, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> >  	int ret = 0;
> >  
> > -	while (size) {
> > +	while (unmapped_size) {
> >  		/* Pin a contiguous chunk of memory */
> > -		npage = vfio_pin_pages_remote(dma, vaddr + dma->size,
> > -					      size >> PAGE_SHIFT, &pfn, limit);
> > +		npage = vfio_pin_pages_remote(dma, start_vaddr + mapped_size,
> > +					      unmapped_size >> PAGE_SHIFT,
> > +					      &pfn, args->limit, args->mm);
> >  		if (npage <= 0) {
> >  			WARN_ON(!npage);
> >  			ret = (int)npage;
> > @@ -1052,22 +1067,50 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
> >  		}
> >  
> >  		/* Map it! */
> > -		ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage,
> > -				     dma->prot);
> > +		ret = vfio_iommu_map(args->iommu, iova + mapped_size, pfn,
> > +				     npage, dma->prot);
> >  		if (ret) {
> > -			vfio_unpin_pages_remote(dma, iova + dma->size, pfn,
> > +			vfio_unpin_pages_remote(dma, iova + mapped_size, pfn,
> >  						npage, true);
> >  			break;
> >  		}
> >  
> > -		size -= npage << PAGE_SHIFT;
> > -		dma->size += npage << PAGE_SHIFT;
> > +		unmapped_size -= npage << PAGE_SHIFT;
> > +		mapped_size   += npage << PAGE_SHIFT;
> >  	}
> >  
> > +	return (ret == 0) ? KTASK_RETURN_SUCCESS : ret;
> 
> Overall I'm a big fan of this, but I think there's an undo problem
> here.  Per 03/13, kc_undo_func is only called for successfully
> completed chunks and each kc_thread_func should handle cleanup of any
> intermediate work before failure.  That's not done here afaict.  Should
> we be calling the vfio_pin_map_dma_undo() manually on the completed
> range before returning error?

Yes, we should be, thanks very much for catching this.

At least I documented what I didn't do?  :)

> 
> > +}
> > +
> > +static void vfio_pin_map_dma_undo(unsigned long start_vaddr,
> > +				  unsigned long end_vaddr,
> > +				  struct vfio_pin_args *args)
> > +{
> > +	struct vfio_dma *dma = args->dma;
> > +	dma_addr_t iova = dma->iova + (start_vaddr - dma->vaddr);
> > +	dma_addr_t end  = dma->iova + (end_vaddr   - dma->vaddr);
> > +
> > +	vfio_unmap_unpin(args->iommu, args->dma, iova, end, true);
> > +}
> > +
> > +static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
> > +			    size_t map_size)
> > +{
> > +	unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > +	int ret = 0;
> > +	struct vfio_pin_args args = { iommu, dma, limit, current->mm };
> > +	/* Stay on PMD boundary in case THP is being used. */
> > +	DEFINE_KTASK_CTL(ctl, vfio_pin_map_dma_chunk, &args, PMD_SIZE);
> 
> PMD_SIZE chunks almost seems too convenient, I wonder a) is that really
> enough work per thread, and b) is this really successfully influencing
> THP?  Thanks,

Yes, you're right on both counts.  I'd been using PUD_SIZE for a while in
testing and meant to switch it back to KTASK_MEM_CHUNK (128M) but used PMD_SIZE
by mistake.  PUD_SIZE chunks have made thread finishing times too spread out
in some cases, so 128M seems to be a reasonable compromise.

Thanks for the thorough and quick review.

Daniel