[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBifWkY2Lt/U1Z7R@nvidia.com>
Date: Mon, 20 Mar 2023 15:00:58 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: "Tian, Kevin" <kevin.tian@...el.com>,
Robin Murphy <robin.murphy@....com>,
"will@...nel.org" <will@...nel.org>,
"eric.auger@...hat.com" <eric.auger@...hat.com>,
"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
"joro@...tes.org" <joro@...tes.org>,
"shameerali.kolothum.thodi@...wei.com"
<shameerali.kolothum.thodi@...wei.com>,
"jean-philippe@...aro.org" <jean-philippe@...aro.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add
arm_smmu_cache_invalidate_user
On Mon, Mar 20, 2023 at 09:12:06AM -0700, Nicolin Chen wrote:
> On Mon, Mar 20, 2023 at 09:59:23AM -0300, Jason Gunthorpe wrote:
> > On Fri, Mar 17, 2023 at 09:41:34AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@...dia.com>
> > > > Sent: Saturday, March 11, 2023 12:20 AM
> > > >
> > > > What I'm broadly thinking is if we have to make the infrastructure for
> > > > VCMDQ HW accelerated invalidation then it is not a big step to also
> > > > have the kernel SW path use the same infrastructure just with a CPU
> > > > wake up instead of a MMIO poke.
> > > >
> > > > Ie we have a SW version of VCMDQ to speed up SMMUv3 cases without HW
> > > > support.
> > > >
> > >
> > > I thought about this in VT-d context. Looks there are some difficulties.
> > >
> > > The most prominent one is that head/tail of the VT-d invalidation queue
> > > are in MMIO registers. Handling it in kernel iommu driver suggests
> > > reading virtual tail register and updating virtual head register. Kind of
> > > moving some vIOMMU awareness into the kernel which, iirc, is not
> > > a welcomed model.
> >
> > qemu would trap the MMIO and generate an IOCTL with the written head
> > pointer. It isn't as efficient as having the kernel do the trap, but
> > does give batching.
>
> Rephrasing that to put into a design: the IOCTL would pass a
> user pointer to the queue, the size of the queue, then a head
> pointer and a tail pointer? Then the kernel reads out all the
> commands between the head and the tail and handles all those
> invalidation commands only?
Yes, that is one possible design
Jason
Powered by blists - more mailing lists