lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zj33cUe7HYOIfj5N@yzhao56-desk.sh.intel.com>
Date: Fri, 10 May 2024 18:31:13 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Alex Williamson <alex.williamson@...hat.com>
CC: <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>, <x86@...nel.org>,
	<jgg@...dia.com>, <kevin.tian@...el.com>, <iommu@...ts.linux.dev>,
	<pbonzini@...hat.com>, <seanjc@...gle.com>, <dave.hansen@...ux.intel.com>,
	<luto@...nel.org>, <peterz@...radead.org>, <tglx@...utronix.de>,
	<mingo@...hat.com>, <bp@...en8.de>, <hpa@...or.com>, <corbet@....net>,
	<joro@...tes.org>, <will@...nel.org>, <robin.murphy@....com>,
	<baolu.lu@...ux.intel.com>, <yi.l.liu@...el.com>
Subject: Re: [PATCH 4/5] vfio/type1: Flush CPU caches on DMA pages in
 non-coherent domains

On Thu, May 09, 2024 at 12:10:49PM -0600, Alex Williamson wrote:
> On Tue,  7 May 2024 14:21:38 +0800
> Yan Zhao <yan.y.zhao@...el.com> wrote:
.. 
> >  drivers/vfio/vfio_iommu_type1.c | 51 +++++++++++++++++++++++++++++++++
> >  1 file changed, 51 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index b5c15fe8f9fc..ce873f4220bf 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -74,6 +74,7 @@ struct vfio_iommu {
> >  	bool			v2;
> >  	bool			nesting;
> >  	bool			dirty_page_tracking;
> > +	bool			has_noncoherent_domain;
> >  	struct list_head	emulated_iommu_groups;
> >  };
> >  
> > @@ -99,6 +100,7 @@ struct vfio_dma {
> >  	unsigned long		*bitmap;
> >  	struct mm_struct	*mm;
> >  	size_t			locked_vm;
> > +	bool			cache_flush_required; /* For noncoherent domain */
> 
> Poor packing, minimally this should be grouped with the other bools in
> the structure, longer term they should likely all be converted to
> bit fields.
Yes. Will do!

> 
> >  };
> >  
> >  struct vfio_batch {
> > @@ -716,6 +718,9 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
> >  	long unlocked = 0, locked = 0;
> >  	long i;
> >  
> > +	if (dma->cache_flush_required)
> > +		arch_clean_nonsnoop_dma(pfn << PAGE_SHIFT, npage << PAGE_SHIFT);
> > +
> >  	for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
> >  		if (put_pfn(pfn++, dma->prot)) {
> >  			unlocked++;
> > @@ -1099,6 +1104,8 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
> >  					    &iotlb_gather);
> >  	}
> >  
> > +	dma->cache_flush_required = false;
> > +
> >  	if (do_accounting) {
> >  		vfio_lock_acct(dma, -unlocked, true);
> >  		return 0;
> > @@ -1120,6 +1127,21 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> >  	iommu->dma_avail++;
> >  }
> >  
> > +static void vfio_update_noncoherent_domain_state(struct vfio_iommu *iommu)
> > +{
> > +	struct vfio_domain *domain;
> > +	bool has_noncoherent = false;
> > +
> > +	list_for_each_entry(domain, &iommu->domain_list, next) {
> > +		if (domain->enforce_cache_coherency)
> > +			continue;
> > +
> > +		has_noncoherent = true;
> > +		break;
> > +	}
> > +	iommu->has_noncoherent_domain = has_noncoherent;
> > +}
> 
> This should be merged with vfio_domains_have_enforce_cache_coherency()
> and the VFIO_DMA_CC_IOMMU extension (if we keep it, see below).
Will convert it to a counter and do the merge.
Thanks for pointing it out!

> 
> > +
> >  static void vfio_update_pgsize_bitmap(struct vfio_iommu *iommu)
> >  {
> >  	struct vfio_domain *domain;
> > @@ -1455,6 +1477,12 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
> >  
> >  	vfio_batch_init(&batch);
> >  
> > +	/*
> > +	 * Record necessity to flush CPU cache to make sure CPU cache is flushed
> > +	 * for both pin & map and unmap & unpin (for unwind) paths.
> > +	 */
> > +	dma->cache_flush_required = iommu->has_noncoherent_domain;
> > +
> >  	while (size) {
> >  		/* Pin a contiguous chunk of memory */
> >  		npage = vfio_pin_pages_remote(dma, vaddr + dma->size,
> > @@ -1466,6 +1494,10 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
> >  			break;
> >  		}
> >  
> > +		if (dma->cache_flush_required)
> > +			arch_clean_nonsnoop_dma(pfn << PAGE_SHIFT,
> > +						npage << PAGE_SHIFT);
> > +
> >  		/* Map it! */
> >  		ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage,
> >  				     dma->prot);
> > @@ -1683,9 +1715,14 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
> >  	for (; n; n = rb_next(n)) {
> >  		struct vfio_dma *dma;
> >  		dma_addr_t iova;
> > +		bool cache_flush_required;
> >  
> >  		dma = rb_entry(n, struct vfio_dma, node);
> >  		iova = dma->iova;
> > +		cache_flush_required = !domain->enforce_cache_coherency &&
> > +				       !dma->cache_flush_required;
> > +		if (cache_flush_required)
> > +			dma->cache_flush_required = true;
> 
> The variable name here isn't accurate and the logic is confusing.  If
> the domain does not enforce coherency and the mapping is not tagged as
> requiring a cache flush, then we need to mark the mapping as requiring
> a cache flush.  So the variable state is something more akin to
> set_cache_flush_required.  But all we're saving with this is a
> redundant set if the mapping is already tagged as requiring a cache
> flush, so it could really be simplified to:
> 
> 		dma->cache_flush_required = !domain->enforce_cache_coherency;
Sorry about the confusion.

If dma->cache_flush_required is set to true by a domain not enforcing cache
coherency, we hope it will not be reset to false by a later attaching to domain 
enforcing cache coherency due to the lazily flushing design.

> It might add more clarity to just name the mapping flag
> dma->mapped_noncoherent.

The dma->cache_flush_required is to mark whether pages in a vfio_dma requires
cache flush in the subsequence mapping into the first non-coherent domain
and page unpinning.
So, mapped_noncoherent may not be accurate.
Do you think it's better to put a comment for explanation? 

struct vfio_dma {
        ...    
        bool                    iommu_mapped;
        bool                    lock_cap;       /* capable(CAP_IPC_LOCK) */
        bool                    vaddr_invalid;
        /*
         *  Mark whether it is required to flush CPU caches when mapping pages
         *  of the vfio_dma to the first non-coherent domain and when unpinning
         *  pages of the vfio_dma
         */
        bool                    cache_flush_required;
        ...    
};
> 
> >  
> >  		while (iova < dma->iova + dma->size) {
> >  			phys_addr_t phys;
> > @@ -1737,6 +1774,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
> >  				size = npage << PAGE_SHIFT;
> >  			}
> >  
> > +			if (cache_flush_required)
> > +				arch_clean_nonsnoop_dma(phys, size);
> > +
> 
> I agree with others as well that this arch callback should be named
> something relative to the cache-flush/write-back operation that it
> actually performs instead of the overall reason for us requiring it.
>
Ok. If there are no objections, I'll rename it to arch_flush_cache_phys() as
suggested by Kevin.

> >  			ret = iommu_map(domain->domain, iova, phys, size,
> >  					dma->prot | IOMMU_CACHE,
> >  					GFP_KERNEL_ACCOUNT);
> > @@ -1801,6 +1841,7 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
> >  			vfio_unpin_pages_remote(dma, iova, phys >> PAGE_SHIFT,
> >  						size >> PAGE_SHIFT, true);
> >  		}
> > +		dma->cache_flush_required = false;
> >  	}
> >  
> >  	vfio_batch_fini(&batch);
> > @@ -1828,6 +1869,9 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain, struct list_head *
> >  	if (!pages)
> >  		return;
> >  
> > +	if (!domain->enforce_cache_coherency)
> > +		arch_clean_nonsnoop_dma(page_to_phys(pages), PAGE_SIZE * 2);
> > +
> >  	list_for_each_entry(region, regions, list) {
> >  		start = ALIGN(region->start, PAGE_SIZE * 2);
> >  		if (start >= region->end || (region->end - start < PAGE_SIZE * 2))
> > @@ -1847,6 +1891,9 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain, struct list_head *
> >  		break;
> >  	}
> >  
> > +	if (!domain->enforce_cache_coherency)
> > +		arch_clean_nonsnoop_dma(page_to_phys(pages), PAGE_SIZE * 2);
> > +
> 
> Seems like this use case isn't subject to the unmap aspect since these
> are kernel allocated and freed pages rather than userspace pages.
> There's not an "ongoing use of the page" concern.
> 
> The window of opportunity for a device to discover and exploit the
> mapping side issue appears almost impossibly small.
>
The concern is for a malicious device attempting DMAs automatically.
Do you think this concern is valid?
As there're only extra flushes for 4 pages, what about keeping it for safety?

> >  	__free_pages(pages, order);
> >  }
> >  
> > @@ -2308,6 +2355,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> >  
> >  	list_add(&domain->next, &iommu->domain_list);
> >  	vfio_update_pgsize_bitmap(iommu);
> > +	if (!domain->enforce_cache_coherency)
> > +		vfio_update_noncoherent_domain_state(iommu);
> 
> Why isn't this simply:
> 
> 	if (!domain->enforce_cache_coherency)
> 		iommu->has_noncoherent_domain = true;
Yes, it's simpler during attach.

> Or maybe:
> 
> 	if (!domain->enforce_cache_coherency)
> 		iommu->noncoherent_domains++;
Yes, this counter is better.
I previously thought a bool can save some space.

> >  done:
> >  	/* Delete the old one and insert new iova list */
> >  	vfio_iommu_iova_insert_copy(iommu, &iova_copy);
> > @@ -2508,6 +2557,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
> >  			}
> >  			iommu_domain_free(domain->domain);
> >  			list_del(&domain->next);
> > +			if (!domain->enforce_cache_coherency)
> > +				vfio_update_noncoherent_domain_state(iommu);
> 
> If we were to just track the number of noncoherent domains, this could
> simply be iommu->noncoherent_domains-- and VFIO_DMA_CC_DMA could be:
> 
> 	return iommu->noncoherent_domains ? 1 : 0;
> 
> Maybe there should be wrappers for list_add() and list_del() relative
> to the iommu domain list to make it just be a counter.  Thanks,

Do you think we can skip the "iommu->noncoherent_domains--" in
vfio_iommu_type1_release() when iommu is about to be freed.

Asking that is also because it's hard for me to find a good name for the wrapper
around list_del().  :)

It follows vfio_release_domain() in vfio_iommu_type1_release(), but not in
vfio_iommu_type1_detach_group().

> 
> 
> >  			kfree(domain);
> >  			vfio_iommu_aper_expand(iommu, &iova_copy);
> >  			vfio_update_pgsize_bitmap(iommu);
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ