lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 19 Oct 2017 11:00:45 +0800 From: "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com> To: Will Deacon <will.deacon@....com> CC: Joerg Roedel <joro@...tes.org>, linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>, iommu <iommu@...ts.linux-foundation.org>, Robin Murphy <robin.murphy@....com>, linux-kernel <linux-kernel@...r.kernel.org>, Hanjun Guo <guohanjun@...wei.com>, Libin <huawei.libin@...wei.com>, Jinyue Li <lijinyue@...wei.com>, Kefeng Wang <wangkefeng.wang@...wei.com> Subject: Re: [PATCH v2 1/3] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction On 2017/10/18 20:58, Will Deacon wrote: > Hi Thunder, > > On Tue, Sep 12, 2017 at 09:00:36PM +0800, Zhen Lei wrote: >> Because all TLBI commands should be followed by a SYNC command, to make >> sure that it has been completely finished. So we can just add the TLBI >> commands into the queue, and put off the execution until meet SYNC or >> other commands. To prevent the followed SYNC command waiting for a long >> time because of too many commands have been delayed, restrict the max >> delayed number. >> >> According to my test, I got the same performance data as I replaced writel >> with writel_relaxed in queue_inc_prod. >> >> Signed-off-by: Zhen Lei <thunder.leizhen@...wei.com> >> --- >> drivers/iommu/arm-smmu-v3.c | 42 +++++++++++++++++++++++++++++++++++++----- >> 1 file changed, 37 insertions(+), 5 deletions(-) > > If we want to go down the route of explicit command batching, I'd much > rather do it by implementing the iotlb_range_add callback in the driver, > and have a fixed-length array of batched ranges on the domain. We could I think even if iotlb_range_add callback is implemented, this patch is still valuable. The main purpose of this patch is to reduce dsb operation. So in the scenario with iotlb_range_add implemented: .iotlb_range_add: spin_lock_irqsave(&smmu->cmdq.lock, flags); ... add tlbi range-1 to cmq-queue ... add tlbi range-n to cmq-queue //n dsb ... spin_unlock_irqrestore(&smmu->cmdq.lock, flags); .iotlb_sync spin_lock_irqsave(&smmu->cmdq.lock, flags); ... add cmd_sync to cmq-queue dsb ... spin_unlock_irqrestore(&smmu->cmdq.lock, flags); Although iotlb_range_add can reduce n-1 dsb operations, but there are still 1 left. If n is not large enough, this patch is helpful. > potentially toggle this function pointer based on the compatible string too, > if it shows only to benefit some systems. [ On 2017/9/19 12:31, Nate Watterson wrote: I tested these (2) patches on QDF2400 hardware and saw performance improvements in line with those I reported when testing the original series. ] I'm not sure whether this patch can improve performance on QDF2400, because there are two patches. But at least it seems harmless, maybe the other hardware platforms are the same. > > Will > > . > -- Thanks! BestRegards
Powered by blists - more mailing lists