[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af0c3116-c110-095d-d250-0b6d56614a0b@huawei.com>
Date: Wed, 11 Aug 2021 10:07:27 +0800
From: "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>
To: Will Deacon <will@...nel.org>
CC: Robin Murphy <robin.murphy@....com>,
Joerg Roedel <joro@...tes.org>,
linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
iommu <iommu@...ts.linux-foundation.org>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC 0/8] iommu/arm-smmu-v3: add support for ECMDQ register
mode
On 2021/8/11 2:35, Will Deacon wrote:
> On Sat, Jun 26, 2021 at 07:01:22PM +0800, Zhen Lei wrote:
>> SMMU v3.3 added a new feature, which is Enhanced Command queue interface
>> for reducing contention when submitting Commands to the SMMU, in this
>> patch set, ECMDQ is the abbreviation of Enhanced Command Queue.
>>
>> When the hardware supports ECMDQ and each core can exclusively use one ECMDQ,
>> each core does not need to compete with other cores when using its own ECMDQ.
>> This means that each core can insert commands in parallel. If each ECMDQ can
>> execute commands in parallel, the overall performance may be better. However,
>> our hardware currently does not support multiple ECMDQ execute commands in
>> parallel.
>>
>> In order to reuse existing code, I originally still call arm_smmu_cmdq_issue_cmdlist()
>> to insert commands. Even so, however, there was a performance improvement of nearly 12%
>> in strict mode.
>>
>> The test environment is the EMU, which simulates the connection of the 200 Gbit/s NIC.
>> Number of queues: passthrough lazy strict(ECMDQ) strict(CMDQ)
>> 6 188 180 162 145 --> 11.7% improvement
>> 8 188 188 184 183 --> 0.55% improvement
>
> Sorry, I don't quite follow the numbers here. Why does the number of queues
> affect the classic "CMDQ" mode? We only have one queue there, right?
These queues indicates the network concurrency, maybe I should use channels or threads.
6 means six threads are deployed on different cores using their own channels to send
and receive network packets.
>
>> In recent days, I implemented a new function without competition with other
>> cores to replace arm_smmu_cmdq_issue_cmdlist() when a core can have an ECMDQ.
>> I'm guessing it might get better performance results. Because the EMU is too
>> slow, it will take a while before the relevant data is available.
>
> I'd certainly prefer to wait until we have something we know is
> representative.
Yes, it would be better to have an actual set of performance data. Now the EMU is
used to analyze hardware problems. This test has not been numbered yet.
> However, I can take the first four prep patches now if you
> respin the second one. At least that's then less for you to carry.
Great. Thank you. I will respin the second one.
>
> I'd also like review from the Arm side on this (and thank you for adopting
> the architecture unlike others seem to have done judging by the patches
> floating around).
>
> Will
> .
>
Powered by blists - more mailing lists