[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150929092714.GD9460@ulmo.nvidia.com>
Date: Tue, 29 Sep 2015 11:27:14 +0200
From: Thierry Reding <thierry.reding@...il.com>
To: Tomasz Figa <tfiga@...omium.org>
Cc: iommu@...ts.linux-foundation.org, Joerg Roedel <joro@...tes.org>,
Hiroshi Doyu <hdoyu@...dia.com>,
Stephen Warren <swarren@...dotorg.org>,
Alexandre Courbot <gnurou@...il.com>,
Vince Hsu <vince.h@...dia.com>,
Russell King <rmk+kernel@....linux.org.uk>,
Paul Walmsley <paul@...an.com>,
Tomeu Vizoso <tomeu.vizoso@...labora.com>,
Mikko Perttunen <mperttunen@...dia.com>,
Will Deacon <will.deacon@....com>,
Alex Williamson <alex.williamson@...hat.com>,
Arnd Bergmann <arnd@...db.de>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Antonios Motakis <a.motakis@...tualopensystems.com>,
Olav Haugan <ohaugan@...eaurora.org>,
Nicolas Iooss <nicolas.iooss_linux@....org>,
linux-kernel@...r.kernel.org, linux-tegra@...r.kernel.org
Subject: Re: [RFC PATCH 0/3] iommu: Add range flush operation
On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote:
> Currently the IOMMU subsystem provides 3 basic operations: iommu_map(),
> iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory
> page by page, however it involves flushing the caches (CPU and IOMMU) for
> every mapped page separately, which is unsuitable for use cases that
> require low mapping latency. Similarly iommu_unmap(), even though it
> takes a full IOVA range as its argument, performs unmapping in a page
> by page manner.
>
> To make mapping operation more suitable for such use cases, iommu_map_sg()
> and .map_sg() callback in iommu_ops struct were introduced, which allowed
> particular IOMMU drivers to directly iterate over SG entries, create
> necessary mappings and flush everything in one go.
>
> This approach, however, has two drawbacks:
> 1) it does not do anything about unmap performance,
> 2) it requires each driver willing to have fast map to implement its
> own SG iteration code, even though this is a mostly generic operation.
>
> This series tries to mitigate the two issues above, while acknowledging
> the fact that the .map_sg() callback might be still necessary for some
> specific platforms, which could have the need to iterate over SG elements
> inside driver code. Proposed solution introduces a new .flush() callback,
> which expects IOVA range as its argument and is expected to flush all
> respective caches (be it CPU, IOMMU TLB or whatever) to make the given
> IOVA area mapping change visible to IOMMU clients. Then all the 3 basic
> map/unmap operations are modified to call the .flush() callback at the end
> of the operation.
>
> Advantages of proposed approach include:
> 1) ability to use default_iommu_map_sg() helper if all the driver needs
> for performance optimization is batching the flush,
> 2) completely no effect on existing code - the .flush() callback is made
> optional and if it isn't implemented drivers are expected to do
> necessary flushes on a page by page basis in respective (un)mapping
> callbakcs,
> 3) possibility of exporting the iommu_flush() operation and providing
> unsynchronized map/unmap operations for subsystems with even higher
> requirements for performance (e.g. drivers/gpu/drm).
That would require passing in some sort of flag that the core shouldn't
be flushing itself, right? Currently it would flush on every map/unmap.
>
> The series includes a generic patch implementing necessary changes in
> IOMMU API and two Tegra-specific patches that demonstrate implementation
> on driver side and which can be used for further testing.
>
> Last, but not least, some performance numbers on Tegra210:
> +-----------+--------------+-------------+------------+
> | Operation | Size [bytes] | Before [us] | After [us] |
> +-----------+--------------+-------------+------------+
> | Map | 128K | 139 | 40 |
> | | | 136 | 34 |
> | | | 137 | 38 |
> | | | 136 | 36 |
> | | 4M | 3939 | 1163 |
> | | | 3730 | 2389 |
> | | | 3613 | 997 |
> | | | 3622 | 1620 |
> | | ~18M | 18635 | 4741 |
> | | | 19261 | 6550 |
> | | | 18473 | 9304 |
> | | | 18125 | 5120 |
> | Unmap | 128K | 128 | 7 |
> | | | 122 | 8 |
> | | | 119 | 10 |
> | | | 123 | 12 |
> | | 4M | 3829 | 151 |
> | | | 3964 | 150 |
> | | | 3908 | 145 |
> | | | 3875 | 155 |
> | | ~18M | 18570 | 683 |
> | | | 18473 | 806 |
> | | | 21020 | 643 |
> | | | 21764 | 652 |
> +-----------+--------------+-------------+------------+
> The values are obtained by surrounding the calls to iommu_map_sg()
> (with default_iommu_map_sg() helper used as .map_sg() callback) and
> iommu_unmap() with ktime-based time measurement code. Taken 4 samples
> of every buffer size. ~18M means around 17-19M due do the variance
> in requested buffer sizes.
Those are pretty impressive numbers.
Thierry
Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)
Powered by blists - more mailing lists