lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 1 May 2015 13:39:53 +1000
From:	David Gibson <david@...son.dropbear.id.au>
To:	Paul Mackerras <paulus@...ba.org>
Cc:	Alexey Kardashevskiy <aik@...abs.ru>,
	linuxppc-dev@...ts.ozlabs.org,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Alex Williamson <alex.williamson@...hat.com>,
	Gavin Shan <gwshan@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH kernel v9 28/32] powerpc/mmu: Add userspace-to-physical
 addresses translation cache

On Thu, Apr 30, 2015 at 06:25:25PM +1000, Paul Mackerras wrote:
> On Thu, Apr 30, 2015 at 04:34:55PM +1000, David Gibson wrote:
> > On Sat, Apr 25, 2015 at 10:14:52PM +1000, Alexey Kardashevskiy wrote:
> > > We are adding support for DMA memory pre-registration to be used in
> > > conjunction with VFIO. The idea is that the userspace which is going to
> > > run a guest may want to pre-register a user space memory region so
> > > it all gets pinned once and never goes away. Having this done,
> > > a hypervisor will not have to pin/unpin pages on every DMA map/unmap
> > > request. This is going to help with multiple pinning of the same memory
> > > and in-kernel acceleration of DMA requests.
> > > 
> > > This adds a list of memory regions to mm_context_t. Each region consists
> > > of a header and a list of physical addresses. This adds API to:
> > > 1. register/unregister memory regions;
> > > 2. do final cleanup (which puts all pre-registered pages);
> > > 3. do userspace to physical address translation;
> > > 4. manage a mapped pages counter; when it is zero, it is safe to
> > > unregister the region.
> > > 
> > > Multiple registration of the same region is allowed, kref is used to
> > > track the number of registrations.
> > 
> > [snip]
> > > +long mm_iommu_alloc(unsigned long ua, unsigned long entries,
> > > +		struct mm_iommu_table_group_mem_t **pmem)
> > > +{
> > > +	struct mm_iommu_table_group_mem_t *mem;
> > > +	long i, j;
> > > +	struct page *page = NULL;
> > > +
> > > +	list_for_each_entry_rcu(mem, &current->mm->context.iommu_group_mem_list,
> > > +			next) {
> > > +		if ((mem->ua == ua) && (mem->entries == entries))
> > > +			return -EBUSY;
> > > +
> > > +		/* Overlap? */
> > > +		if ((mem->ua < (ua + (entries << PAGE_SHIFT))) &&
> > > +				(ua < (mem->ua + (mem->entries << PAGE_SHIFT))))
> > > +			return -EINVAL;
> > > +	}
> > > +
> > > +	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> > > +	if (!mem)
> > > +		return -ENOMEM;
> > > +
> > > +	mem->hpas = vzalloc(entries * sizeof(mem->hpas[0]));
> > > +	if (!mem->hpas) {
> > > +		kfree(mem);
> > > +		return -ENOMEM;
> > > +	}
> > 
> > So, I've thought more about this and I'm really confused as to what
> > this is supposed to be accomplishing.
> > 
> > I see that you need to keep track of what regions are registered, so
> > you don't double lock or unlock, but I don't see what the point of
> > actualy storing the translations in hpas is.
> > 
> > I had assumed it was so that you could later on get to the
> > translations in real mode when you do in-kernel acceleration.  But
> > that doesn't make sense, because the array is vmalloc()ed, so can't be
> > accessed in real mode anyway.
> 
> We can access vmalloc'd arrays in real mode using real_vmalloc_addr().

Ah, ok.

> > I can't think of a circumstance in which you can use hpas where you
> > couldn't just walk the page tables anyway.
> 
> The problem with walking the page tables is that there is no guarantee
> that the page you find that way is the page that was returned by the
> gup_fast() we did earlier.  Storing the hpas means that we know for
> sure that the page we're doing DMA to is one that we have an elevated
> page count on.
> 
> Also, there are various points where a Linux PTE is made temporarily
> invalid for a short time.  If we happened to do a H_PUT_TCE on one cpu
> while another cpu was doing that, we'd get a spurious failure returned
> by the H_PUT_TCE.

I think we want this explanation in the commit message.  Anr/or in a
comment somewhere, I'm not sure.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists