linux-kernel - Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBtZ7F3NsxngcKIq@nvidia.com>
Date:   Wed, 22 Mar 2023 16:41:32 -0300
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Nicolin Chen <nicolinc@...dia.com>
Cc:     "Tian, Kevin" <kevin.tian@...el.com>,
        Robin Murphy <robin.murphy@....com>,
        "will@...nel.org" <will@...nel.org>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>,
        "baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
        "joro@...tes.org" <joro@...tes.org>,
        "shameerali.kolothum.thodi@...wei.com" 
        <shameerali.kolothum.thodi@...wei.com>,
        "jean-philippe@...aro.org" <jean-philippe@...aro.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add
 arm_smmu_cache_invalidate_user

On Wed, Mar 22, 2023 at 12:21:27PM -0700, Nicolin Chen wrote:

> Do you prefer this to happen with this series? 

No, I just don't want to exclude doing it someday if people are
interested to optimize this. As I said in the other thread I'd rather
optimize SMMUv3 emulation than try to use virtio-iommu to make it run
faster.

> the uAPI would be completely compatible. It seems to me that
> we would need a different uAPI, so as to setup a queue in an
> earlier stage, and then to ring a bell when QEMU traps any
> incoming commands in the emulated VCMDQ.

Yes, it would need more uAPI. Lets just make sure there is room and
maybe think a bit about what it would look like.

You should also draft through the HW vCMDQ stuff to ensure it fits
in here nicely.

 
> > > Btw, just to confirm my understanding, a use case having two
> > > or more iommu_domains means an S2 iommu_domain replacement,
> > > right? I.e. a running S2 iommu_domain gets replaced on the fly
> > > by a different S2 iommu_domain holding a different VMID, while
> > > the IOAS still has the previous mappings? When would that
> > > actually happen in the real world?
> > 
> > It doesn't have to be replace - what is needed is that evey vPCI
> > device connected to the same SMMU instance be using the same S2 and
> > thus the same VM_ID.
> > 
> > IOW evey SID must be linked to the same VM_ID or invalidation commands
> > will not be properly processed.
> > 
> > qemu would have to have multiple SMMU instances according to S2
> > domains, which is probably true anyhow since we need to know what
> > physical SMMU instance to deliver the invalidation too anyhow.
> 
> I am not 100% following this part. So, you mean that we're
> safe if we only have one SMMU instance, because there'd be
> only one S2 domain, while multiple S2 domains would happen
> if we have multiple SMMU instances?

Yes, that would happen today, especially since each smmu has its own
vm_id allocator IIRC
 
> Can we still use the same S2 domain for multiple instances?

I think not today.

At the core, if we share the same S2 domain then it is a problem to
figure out what smmu instance to send the invalidation command too. EG
if the userspace invalidates ASID 1 you'd have to replicate
invalidation to all SMMU instances. Even if ASID 1 is used by only a
single SID/STE that has a single SMMU instance backing it.

So I think for ARM we want to reflect the physical SMMU instances into
vSMMU instances and that feels best done by having a unique S2
iommu_domain for each SMMU instance. Then we know that an invalidation
for a SMMU instance is delivered to that S2's singular CMDQ and things
like vCMDQ become possible.

> Our approach of setting up a stage-2 mapping in QEMU is to
> map the entire guest memory. I don't see a point in having
> a separate S2 domain, even if there are multiple instances?

And then this is the drawback, we don't really want to have duplicated
S2 page tables in the system for every stage 2.

Maybe we have made a mistake by allowing the S2 to be an unmanaged
domain. Perhaps we should create the S2 out of an unmanaged domain
like the S1.

Then the rules could be
 - Unmanaged domain can be used with every smmu instance, only one
   copy of the page table. The ASID in the iommu_domain is
   kernel-global
 - S2 domain is a child of a shared unmanaged domain. It can be used
   only with the SMMU it is associated with, it has a per-SMMU VM ID
 - S1 domain is a child of a S2 domain, it can be used only with the
   SMMU it's S2 is associated with, just because

> Btw, from a private discussion with Eric, he expressed the
> difficulty of adding multiple SMMU instances in QEMU, as it
> would complicate the device and ACPI components. 

I'm not surprised by this, but for efficiency we probably have to do
this. Eric am I wrong?

qemu shouldn't have to do it immediately, but the kernel uAPI should
allow for a VMM that is optimized. We shouldn't exclude this by
mis-designing the kernel uAPI. qemu can replicate the invalidations
itself to make an ineffecient single vSMMU.

> For VCMDQ, we do need a multi-instance environment, because there
> are multiple physical pairs of SMMU+VCMDQ, i.e. multiple VCMDQ MMIO
> regions being attached/used by different devices. 

Yes. IMHO vCMDQ is the sane design here - invalidation performance is
important, having a kernel-bypass way to do it is ideal. I understand
AMD has a similar kernel-bypass queue approach for their stuff too. I
think everyone will eventually need to do this, especially for CC
applications. Having the hypervisor able to interfere with
invalidation feels like an attack vector.

So we should focus on long term designs that allow kernel-bypass to
work, and I don't see way to hide multi-instance and still truely
support vCMDQ??

> So, I have been exploring a different approach by creating an
> internal multiplication inside VCMDQ...

How can that work?

You'd have to have the guest VM to know to replicate to different
vCMDQ's? Which isn't the standard SMMU programming model anymore..

Jason