[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170817165040.GK2853@suse.de>
Date: Thu, 17 Aug 2017 18:50:40 +0200
From: Joerg Roedel <jroedel@...e.de>
To: Will Deacon <will.deacon@....com>
Cc: Joerg Roedel <joro@...tes.org>, iommu@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org,
Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>,
Alex Williamson <alex.williamson@...hat.com>,
Robin Murphy <robin.murphy@....com>
Subject: Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing
Hi Will,
On Thu, Aug 17, 2017 at 05:32:35PM +0100, Will Deacon wrote:
> I really like the idea of this, but I have a couple of questions and
> comments below.
Great, this together with the common iova-flush it should make it
possible to solve the performance problems of the dma-iommu code.
> > +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova,
> > + phys_addr_t paddr, size_t size, int prot)
> > +{
> > + int ret = iommu_map(domain, iova, paddr, size, prot);
> > +
> > + iommu_tlb_range_add(domain, iova, size);
> > + iommu_tlb_sync(domain);
>
> Many IOMMUs don't need these callbacks on ->map operations, but they won't
> be able to distinguish them easily with this API. Could you add a flags
> parameter or something to the iommu_tlb_* functions, please?
Yeah, this is only needed for virtualized IOMMUs that have a non-present
cache. My idea was to let the iommu-drivers tell the common code whether
the iommu needs it and the code above just checks a flag and omits the
calls to the flush-functions then.
Problem currently is how to get this information from
'struct iommu_device' to 'struct iommu_domain'. As a workaround I
consider a per-domain flag in the iommu drivers which checks whether any
unmap has happened and just do nothing on the flush-call-back if there
were none.
> I think we will struggle to implement this efficiently on ARM SMMUv3. The
> way invalidation works there is that there is a single in-memory command
> queue into which we can put TLB invalidation commands (they are inserted
> under a lock). These are then processed asynchronously by the hardware, and
> you can complete them by inserting a SYNC command and waiting for that to
> be consumed by the SMMU. Sounds like a perfect fit, right?
Yes, its basically the same as way as it works on AMD-Vi and Intel VT-d.
> The problem is that we want to add those invalidation commands as early
> as possible, so that they can be processed by the hardware concurrently
> with us unmapping other pages.
I think that's a bad idea, because then you re-introduce the performance
problems again because everyone will spin on the cmd-queue lock in the
unmap path of the dma-api.
> That means adding the invalidation commands in the ->unmap callback
> and not bothering to implement ->iotlb_range_add callback at all.
> Then, we will do the sync in ->iotlb_sync. This falls apart if
> somebody decides to use iommu_flush_tlb_all(), where we would prefer
> not to insert all of the invalidation commands in unmap and instead
> insert a single invalidate-all command, followed up with a SYNC.
This problem can be solved with the deferred iova flushing code I posted
to the ML. When a queue fills up, iommu_flush_tlb_all() is called and
every entry that was unmapped before can be released. This works well on
x86, are there reasons it wouldn't on ARM?
Regards,
Joerg
Powered by blists - more mailing lists