[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8967e349-3af1-af17-dfa2-187d06dca18c@gmail.com>
Date: Mon, 7 May 2018 20:38:31 +0300
From: Dmitry Osipenko <digetx@...il.com>
To: Joerg Roedel <joro@...tes.org>
Cc: Robin Murphy <robin.murphy@....com>,
Thierry Reding <thierry.reding@...il.com>,
linux-tegra@...r.kernel.org, iommu@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org,
Jonathan Hunter <jonathanh@...dia.com>
Subject: Re: [PATCH v1 4/4] iommu/tegra: gart: Optimize map/unmap
On 07.05.2018 18:51, Dmitry Osipenko wrote:
[snip]
> Secondly, the interesting part is that mapping / unmapping of a contiguous
> allocation (CMA using DMA API) is slower by ~50% then doing it for a sparse
> allocation (get_pages using bare IOMMU API). /I think/ it's a shortcoming of the
> arch/arm/mm/dma-mapping.c, which also suffers from other inflexibilities that
> Thierry faced recently. Though I haven't really tried to figure out what is the
> bottleneck yet and Thierry was going to re-write ARM's dma-mapping
> implementation anyway, I'll take a closer look at this issue a bit later.
Please scratch my accusation of ARM's dma-mapping, it's not the culprit at all.
I completely forgot that in a case of sparse allocation displays framebuffer
IOMMU mapping is "pinned" to the GART and hence it's not getting dynamically
mapped / unmapped during of my testing. I also forgot to set CPU freq governor
to "perfomance", that reduced 50% to 20% of the above perf difference. The rest
of the testing is unaffected, flushing after whole mapping is still much more
efficient than flushing after modification of each page entry. And yet again,
performance of sparse mapping is nearly the same as of contiguous mapping unless
sparse allocation is large and _very_ fragmented.
Powered by blists - more mailing lists