[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <498100e8-e94e-4a65-a9e1-ae59bd59fe2d@broadcom.com>
Date: Mon, 5 Jun 2017 11:03:15 -0700
From: Ray Jui <ray.jui@...adcom.com>
To: Will Deacon <will.deacon@....com>
Cc: Marc Zyngier <marc.zyngier@....com>,
Robin Murphy <robin.murphy@....com>,
Mark Rutland <mark.rutland@....com>,
Joerg Roedel <joro@...tes.org>,
linux-arm-kernel@...ts.infradead.org,
iommu@...ts.linux-foundation.org,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Device address specific mapping of arm,mmu-500
Hi Will/Robin,
Just want to check with you on this again. Do you have a very rough
timeline on when the excessive locking in the IOMMU driver may be fixed
(so we can restore expected up to 95% performance)?
Thanks,
Ray
On 5/31/17 10:32 AM, Ray Jui wrote:
> Hi Will,
>
> On 5/31/17 5:44 AM, Will Deacon wrote:
>> On Tue, May 30, 2017 at 11:13:36PM -0700, Ray Jui wrote:
>>> I did a little more digging myself and I think I now understand what you
>>> meant by identity mapping, i.e., configuring the MMU-500 with 1:1 mapping
>>> between the DMA address and the IOVA address.
>>>
>>> I think that should work. In the end, due to this MSI write parsing issue in
>>> our PCIe controller, the reason to use IOMMU is to allow the cache
>>> attributes (AxCACHE) of the MSI writes towards GICv3 ITS to be modified by
>>> the IOMMU to be device type, while leaving the rest of inbound reads/writes
>>> from/to DDR with more optimized cache attributes setting, to allow I/O
>>> coherency to be still enabled for the PCIe controller. In fact, the PCIe
>>> controller itself is fully capable of DMA to/from the full address space of
>>> our SoC including both DDR and any device memory.
>>>
>>> The 1:1 mapping will still pose some translation overhead like you
>>> suggested; however, the overhead of allocating page tables and locking will
>>> be gone. This sounds like the best possible option I have currently.
>>
>> It might end up being pretty invasive to work around a hardware bug, so
>> we'll have to see what it looks like. Ideally, we could just use the SMMU
>> for everything as-is and work on clawing back the lost performance (it
>> should be possible to get ~95% of the perf if we sort out the locking, which
>> we *are* working on).
>>
>
> If 95% of performance can be achieved by fixing the locking in the
> driver, then that's great news.
>
> If you have anything that you want me to help test, feel free to send it
> out. I will be more than happy to help testing it and let you know about
> the performance numbers, :)
>
>>> May I ask, how do I start to try to get this identity mapping to work as an
>>> experiment and proof of concept? Any pointer or advise is highly appreciated
>>> as you can see I'm not very experienced with this. I found Will recently
>>> added the IOMMU_DOMAIN_IDENTITY support to the arm-smmu driver. But I
>>> suppose that is to bypass the SMMU completely, instead of still going
>>> through the MMU with 1:1 translation. Is my understanding correct?
>>
>> Yes, I don't think IOMMU_DOMAIN_IDENTITY is what you need because you
>> actally need per-page control of memory attributes.
>>
>> Robin might have a better idea, but I think you'll have to hack dma-iommu.c
>> so that you can have a version of the DMA ops that:
>>
>> * Initialises the identity map (I guess as normal WB cacheable?)
>> * Reserves and maps the MSI region appropriately
>> * Just returns the physical address for the dma address for map requests
>> (return error for the MSI region)
>> * Does nothing for unmap requests
>>
>> But my strong preference would be to fix the locking overhead from the
>> SMMU so that the perf hit is acceptable.
>
> Yes, I agree, we want to be able to use the SMMU the intended way. Do
> you have a timeline on when the locking issue may be fixed (or
> improved)? Depending on the timeline, on our side, we may still need to
> go for identity mapping as a temporary solution until the fix.
>
>>
>> Will
>>
>
> Thanks,
>
> Ray
>
Powered by blists - more mailing lists