linux-kernel - Re: [PATCH] iommu/iova: Bettering utilizing cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0322849d-dc1f-4e1c-a47a-463f3c301bdc@huawei.com>
Date: Mon, 24 Jun 2024 21:13:11 +0800
From: "zhangzekun (A)" <zhangzekun11@...wei.com>
To: Robin Murphy <robin.murphy@....com>, <joro@...tes.org>, <will@...nel.org>,
	<john.g.garry@...cle.com>
CC: <iommu@...ts.linux.dev>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] iommu/iova: Bettering utilizing cpu_rcaches in no-strict
 mode


在 2024/6/24 18:56, Robin Murphy 写道:
> On 2024-06-24 9:39 am, Zhang Zekun wrote:
>> Currently, when iommu working in no-strict mode, fq_flush_timeout()
>> will always try to free iovas on one cpu. Freeing the iovas from all
>> cpus on one cpu is not cache-friendly to iova_rcache, because it will
>> first filling up the cpu_rcache and then pushing iovas to the depot,
>> iovas in the depot will finally goto the underlying rbtree if the
>> depot_size is greater than num_online_cpus().
> 
> That is the design intent - if the excess magazines sit in the depot 
> long enough to be reclaimed then other CPUs didn't want them either. 
> We're trying to minimise the amount of unused cached IOVAs sitting 
> around wasting memory, since IOVA memory consumption has proven to be 
> quite significant on large systems.

Hi, Robin,

It does waste some memory after this change, but since we have been 
freeing iova on each cpu in strict-mode for years, I think this change 
seems reasonable to make the iova free logic identical to strict-mode.
This patch try to make the speed of allcating and freeing iova roughly 
same by better utilizing the iova_rcache, or we will more likely enter 
the slowpath. The save of memory consumption is actually at the cost of 
performance, I am not sure if we need such a optimization for no-strict 
mode which is mainly used for performance consideration.

> 
> As alluded to in the original cover letter, 100ms for IOVA_DEPOT_DELAY 
> was just my arbitrary value of "long enough" to keep the initial 
> implementation straightforward - I do expect that certain workloads 
> might benefit from tuning it differently, but without proof of what they 
> are and what they want, there's little justification for introducing 
> extra complexity and potential user ABI yet.
> 
>> Let fq_flush_timeout()
>> freeing iovas on cpus who call dma_unmap* APIs, can decrease the overall
>> time caused by fq_flush_timeout() by better utilizing the iova_rcache,
>> and minimizing the competition for the iova_rbtree_lock().
> 
> I would have assumed that a single CPU simply throwing magazines into 
> the depot list from its own percpu cache is quicker, or at least no 
> slower, than doing the same while causing additional contention/sharing 
> by interfering with other percpu caches as well. And where does the 
> rbtree come into that either way? If an observable performance issue 
> actually exists here, I'd like a more detailed breakdown to understand it.
> 
> Thanks,
> Robin.
> 

This patch is firstly intent to minimize the chance of softlock issue in 
fq_flush_timeout(), which is already dicribed erarlier in [1], which has 
beed applied in a commercial kernel[2] for years.

However, the later tests show that this single patch is not enough to 
fix the softlockup issue, since the root cause of softlockup is the 
underlying iova_rbtree_lock. In our softlockup scenarios, the average
time cost to get this spinlock is about 6ms.

[1] https://lkml.org/lkml/2023/2/16/124
[2] https://gitee.com/openeuler/kernel/tree/OLK-5.10/

Thanks,
Zekun