lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250102201614.GA26854@ziepe.ca>
Date: Thu, 2 Jan 2025 16:16:14 -0400
From: Jason Gunthorpe <jgg@...pe.ca>
To: Mostafa Saleh <smostafa@...gle.com>
Cc: iommu@...ts.linux.dev, kvmarm@...ts.linux.dev,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	catalin.marinas@....com, will@...nel.org, maz@...nel.org,
	oliver.upton@...ux.dev, joey.gouly@....com, suzuki.poulose@....com,
	yuzenghui@...wei.com, robdclark@...il.com, joro@...tes.org,
	robin.murphy@....com, jean-philippe@...aro.org, nicolinc@...dia.com,
	vdonnefort@...gle.com, qperret@...gle.com, tabba@...gle.com,
	danielmentz@...gle.com, tzukui@...gle.com
Subject: Re: [RFC PATCH v2 00/58] KVM: Arm SMMUv3 driver for pKVM

On Fri, Dec 13, 2024 at 07:39:04PM +0000, Mostafa Saleh wrote:
> Thanks a lot for taking the time to review this, I tried to reply to all
> points. However I think a main source of confusion was that this is only
> for the host kernel not guests, with this series guests still have no
> access to DMA under pKVM. I hope that clarifies some of the points.

I think I just used different words, I ment the direct guest of pvkm,
including what you are calling the host kernel.

> > The cover letter doesn't explain why someone needs page tables in the
> > guest at all?
> 
> This is not for guests but for the host, the hypervisor needs to
> establish DMA isolation between the host and the hypervisor/guests.

Why isn't this done directly in pkvm by setting up IOMMU tables that
identity map the host/guest's CPU mapping? Why does the host kernel or
guest kernel need to have page tables?

> However, guest DMA support is optional and only needed for device
> passthrough, 

Why? The CC cases are having the pkvm layer control the translation,
so when the host spawns a guest the pkvm will setup a contained IOMMU
translation for that guest as well.

Don't you also want to protect the guests from the host in this model?

> We can do that for the host also, which is discussed in the v1 cover
> letter. However, we try to keep feature parity with the normal (VHE)
> KVM arm64 support, so constraining KVM support to not have IOVA spaces
> for devices seems too much and impractical on modern systems (phones for
> example).

But why? Do you have current use cases on phone where you need to have
device-specific iommu_domains? What are they? Answering this goes a
long way to understanding the real performance of a para virt approach.

> There is no hacking for the arm-smmu-v3 driver, but mostly splitting
> the driver so it can be re-used + introduction for a separate
> hypervisor

I understood splitting some of it so you could share code with the
pkvm side, but I don't see that it should be connected to the
host/guest driver. Surely that should be a generic pkvm-iommu driver
that is arch neutral, like virtio-iommu.

> With pKVM, the host kernel is not trusted, and if compromised it can
> instrument such attacks to corrupt hypervisor memory, so the hypervisor
> would lock io-pgtable-arm operations in EL2 to avoid that.

io-pgtable-arm has a particular set of locking assumptions, the caller
has to follow it. When pkvm converts the hypercalls for the
para-virtualization into io-pgtable-arm calls it has to also ensure it
follows io-pgtable-arm's locking model if it is going to use that as
its code base. This has nothing to do with the guest or trust, it is
just implementing concurrency correctly in pkvm..

> Yeah, SVA is tricky, I guess for that we would have to use nesting,
> but tbh, I don’t think it’s a deal breaker for now.

Again, it depends what your actual use case for translation is inside
the host/guest environments. It would be good to clearly spell this out..
There are few drivers that directly manpulate the iommu_domains of a
device. a few gpus, ath1x wireless, some tegra stuff, "venus". Which
of those are you targetting?

> > Lots of people have now done this, it is not really so bad. In
> > exchange you get a full architected feature set, better performance,
> > and are ready for HW optimizations.
> 
> It’s not impossible, it’s just more complicated doing it in the
> hypervisor which has limited features compared to the kernel + I haven’t
> seen any open source implementation for that except for Qemu which is in
> userspace.

People are doing it in their CC stuff, which is about the same as
pkvm. I'm not sure if it will be open source, I hope so since it needs
security auditing..

> > > - Add IDENTITY_DOMAIN support, I already have some patches for that, but
> > >   didn’t want to complicate this series, I can send them separately.
> > 
> > This seems kind of pointless to me. If you can tolerate identity (ie
> > pin all memory) then do nested, and maybe don't even bother with a
> > guest iommu.
> 
> As mentioned, the choice for para-virt was not only to avoid pinning,
> as this is the host, for IDENTITY_DOMAIN we either share the page table,
> then we have to deal with lazy mapping (SMMU features, BBM...) or mirror
> the table in a shadow SMMU only identity page table.

AFAIK you always have to mirror unless you significantly change how
the KVM S1 page table stuff is working. The CC people have made those
changes and won't mirror, so it is doable..

> > My advice for merging would be to start with the pkvm side setting up
> > a fully pinned S2 and do not have a guest driver. Nesting without
> > emulating smmuv3. Basically you get protected identity DMA support. I
> > think that would be a much less sprawling patch series. From there it
> > would be well positioned to add both smmuv3 emulation and a paravirt
> > iommu flow.
> 
> I am open to any suggestions, but I believe any solution considered for
> merge, should have enough features to be usable on actual systems (translating
> IOMMU can be used for example) so either para-virt as this series or full
> nesting as the PoC above (or maybe both?), which IMO comes down to the
> trade-off mentioned above.

IMHO no, you can have a completely usable solution without host/guest
controlled translation. This is equivilant to a bare metal system with
no IOMMU HW. This exists and is still broadly useful. The majority of
cloud VMs out there are in this configuration.

That is the simplest/smallest thing to start with. Adding host/guest
controlled translation is a build-on-top excercise that seems to have
a lot of options and people may end up wanting to do all of them.

I don't think you need to show that host/guest controlled translation
is possible to make progress, of course it is possible. Just getting
to the point where pkvm can own the SMMU HW and provide DMA isolation
between all of it's direct host/guest is a good step.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ