lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB52760945FBCF542F77DCF4CA8CE02@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Thu, 23 Jan 2025 08:13:34 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Mostafa Saleh <smostafa@...gle.com>
CC: Jason Gunthorpe <jgg@...pe.ca>, "iommu@...ts.linux.dev"
	<iommu@...ts.linux.dev>, "kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "catalin.marinas@....com"
	<catalin.marinas@....com>, "will@...nel.org" <will@...nel.org>,
	"maz@...nel.org" <maz@...nel.org>, "oliver.upton@...ux.dev"
	<oliver.upton@...ux.dev>, "joey.gouly@....com" <joey.gouly@....com>,
	"suzuki.poulose@....com" <suzuki.poulose@....com>, "yuzenghui@...wei.com"
	<yuzenghui@...wei.com>, "robdclark@...il.com" <robdclark@...il.com>,
	"joro@...tes.org" <joro@...tes.org>, "robin.murphy@....com"
	<robin.murphy@....com>, "jean-philippe@...aro.org"
	<jean-philippe@...aro.org>, "nicolinc@...dia.com" <nicolinc@...dia.com>,
	"vdonnefort@...gle.com" <vdonnefort@...gle.com>, "qperret@...gle.com"
	<qperret@...gle.com>, "tabba@...gle.com" <tabba@...gle.com>,
	"danielmentz@...gle.com" <danielmentz@...gle.com>, "tzukui@...gle.com"
	<tzukui@...gle.com>
Subject: RE: [RFC PATCH v2 00/58] KVM: Arm SMMUv3 driver for pKVM

> From: Mostafa Saleh <smostafa@...gle.com>
> Sent: Wednesday, January 22, 2025 7:04 PM
> 
> On Fri, Jan 17, 2025 at 06:57:12AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@...pe.ca>
> > > Sent: Friday, January 17, 2025 3:15 AM
> > >
> > > On Thu, Jan 16, 2025 at 06:39:31AM +0000, Tian, Kevin wrote:
> > > > > From: Mostafa Saleh <smostafa@...gle.com>
> > > > > Sent: Wednesday, January 8, 2025 8:10 PM
> > > > >
> > > > > On Thu, Jan 02, 2025 at 04:16:14PM -0400, Jason Gunthorpe wrote:
> > > > > > On Fri, Dec 13, 2024 at 07:39:04PM +0000, Mostafa Saleh wrote:
> > > > > > > Yeah, SVA is tricky, I guess for that we would have to use nesting,
> > > > > > > but tbh, I don’t think it’s a deal breaker for now.
> > > > > >
> > > > > > Again, it depends what your actual use case for translation is inside
> > > > > > the host/guest environments. It would be good to clearly spell this
> out..
> > > > > > There are few drivers that directly manpulate the iommu_domains
> of a
> > > > > > device. a few gpus, ath1x wireless, some tegra stuff, "venus". Which
> > > > > > of those are you targetting?
> > > > > >
> > > > >
> > > > > Not sure I understand this point about manipulating domains.
> > > > > AFAIK, SVA is not that common, including mobile spaces but I can be
> > > wrong,
> > > > > that’s why it’s not a priority here.
> > > >
> > > > Nested translation is required beyond SVA. A scenario which requires
> > > > a vIOMMU and multiple device domains within the guest would like to
> > > > embrace nesting. Especially for ARM vSMMU nesting is a must.
> 
> We can still do para-virtualization for guests the same way we do for the
> host and use a single stage IOMMU.

same way but both require a nested setup.

In concept there are two layers of address translations: GVA->GPA via
guest page table, and GPA->HPA via pKVM page table.

The difference between host/guest is just on the GPA mapping. For host
it's 1:1 with additional hardening for which portion can be mapped and
which cannot. For guest it's non-identical with the mapping established
from the host.

A nested translation naturally fits that conceptual layers.

Using a single-stage IOMMU means you need to combine two layers
into one layer i.e. GVA->HPA by removing GPA. Then you have to
paravirt guest page table so every guest PTE change is intercepted
to replace GPA with HPA.

Doing so completely kills the benefit of SVA, which is why Jason said
a no-go.

> 
> > >
> > > Right, if you need an iommu domain in the guest there are only three
> > > mainstream ways to get this in Linux:
> > >  1) Use the DMA API and have the iommu group be translating. This is
> > >     optional in that the DMA API usually supports identity as an option.
> > >  2) A driver directly calls iommu_paging_domain_alloc() and manually
> > >     attaches it to some device, and does not use the DMA API. My list
> > >     above of ath1x/etc are examples doing this
> > >  3) Use VFIO
> > >
> > > My remark to Mostafa is to be specific, which of the above do you want
> > > to do in your mobile guest (and what driver exactly if #2) and why.
> > >
> > > This will help inform what the performance profile looks like and
> > > guide if nesting/para virt is appropriate.
> >
> 
> AFAIK, the most common use cases would be:
> - Devices using DMA API because it requires a lot of memory to be
>   contiguous in IOVA, which is hard to do with identity
> - Devices with security requirements/constraints to be isolated from the
>   rest of the system, also using DMA API
> - VFIO is something we are looking at the moment and have prototyped with
>   pKVM, and it should be supported soon in Android (only for platform
>   devices for now)

what really matters is the frequency of map/unmap.

> 
> > Yeah that part would be critical to help decide which route to pursue
> > first. Even when all options might be required in the end when pKVM
> > is scaled to more scenarios, as you mentioned in another mail, a staging
> > approach would be much preferrable to evolve.
> 
> I agree that would probably be the case. I will work on more staging
> approach for v3, mostly without the pv part as Jason suggested.
> 
> >
> > The pros/cons between nesting/para virt is clear - more static the
> > mapping is, more gain from the para approach due to less paging
> > walking and smaller tlb footprint, while vice versa nesting performs
> > much better by avoiding frequent para calls on page table mgmt. 😊
> 
> I am also working to get the numbers for both cases so we know
> the order of magnitude of each case, as I guess it won't be as clear
> for large systems with many DMA initiators what approach is best.
> 
> 

That'd be great!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ