[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBs6xlqMGYhLbI27@nvidia.com>
Date: Wed, 22 Mar 2023 14:28:38 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: "Tian, Kevin" <kevin.tian@...el.com>,
Robin Murphy <robin.murphy@....com>,
"will@...nel.org" <will@...nel.org>,
"eric.auger@...hat.com" <eric.auger@...hat.com>,
"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
"joro@...tes.org" <joro@...tes.org>,
"shameerali.kolothum.thodi@...wei.com"
<shameerali.kolothum.thodi@...wei.com>,
"jean-philippe@...aro.org" <jean-philippe@...aro.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add
arm_smmu_cache_invalidate_user
On Wed, Mar 22, 2023 at 10:11:33AM -0700, Nicolin Chen wrote:
> > Yes, there are a few different ways to handle this and still preserve
> > batching. It is part of the reason it would be hard to make the kernel
> > natively parse the commandq
>
> Yea. I think the way I described above might be the cleanest,
> since the host kernel would only handle all the leftover TLBI
> commands? I am open for other better idea, if there's any.
It seems best to have userspace take a first pass over the cmdq and
then send what it didn't handle to the kernel
> > On the other hand, we could add some more native kernel support for a
> > SW emulated vCMDQ and that might be interesting for performance.
>
> That's something I have thought about too. But it would feel
> like changing the "hardware" of the VM, right? If the host
> kernel enables nesting, then we'd have this extra queue for
> TLBI commands. From the driver prospective, it would feels
> like detecting an extra feature bit in the HW register, but
> there's no such bit in the SMMU HW spec :)
You'd trigger it the same way vCMDQ triggers. It is basically SW
emulated vCMDQ.
> Yet, would you please elaborate how it impacts performance?
> I can only see the benefit of isolation, from having a SW
> emulated VCMDQ exclusively for TLBI commands v.s. having a
> single CMDQ interlacing different commands, because both of
> them requires trapping and some sort of dispatching.
In theory would could make it work like virtio-iommu where the
doorbell ring for the SW emulated vCMDQ is delivered directly to a
kernel thread and chop a bunch of latency out of it.
The issue is latency to complete invalidation as in a vSVA scenario
the virtual process MM will block on IOMMU invlidation whenever it
does any mm_struct maintenance. Ie you slow a vast set of
operations. The less latency the better.
> Btw, just to confirm my understanding, a use case having two
> or more iommu_domains means an S2 iommu_domain replacement,
> right? I.e. a running S2 iommu_domain gets replaced on the fly
> by a different S2 iommu_domain holding a different VMID, while
> the IOAS still has the previous mappings? When would that
> actually happen in the real world?
It doesn't have to be replace - what is needed is that evey vPCI
device connected to the same SMMU instance be using the same S2 and
thus the same VM_ID.
IOW evey SID must be linked to the same VM_ID or invalidation commands
will not be properly processed.
qemu would have to have multiple SMMU instances according to S2
domains, which is probably true anyhow since we need to know what
physical SMMU instance to deliver the invalidation too anyhow.
Jason
Powered by blists - more mailing lists