linux-kernel - Re: [PATCH] swiotlb: allocate memory in a cache-friendly way

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20210917030334.GA20257@gao-cwp>
Date:   Fri, 17 Sep 2021 11:03:35 +0800
From:   Chao Gao <chao.gao@...el.com>
To:     Konrad Rzeszutek Wilk <konrad@...nok.org>
Cc:     Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        Christoph Hellwig <hch@....de>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Robin Murphy <robin.murphy@....com>,
        iommu@...ts.linux-foundation.org, linux-kernel@...r.kernel.org,
        Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH] swiotlb: allocate memory in a cache-friendly way

On Thu, Sep 16, 2021 at 11:49:39AM -0400, Konrad Rzeszutek Wilk wrote:
>On Wed, Sep 01, 2021 at 12:21:35PM +0800, Chao Gao wrote:
>> Currently, swiotlb uses a global index to indicate the starting point
>> of next search. The index increases from 0 to the number of slots - 1
>> and then wraps around. It is straightforward but not cache-friendly
>> because the "oldest" slot in swiotlb tends to be used first.
>> 
>> Freed slots are probably accessed right before being freed, especially
>> in VM's case (device backends access them in DMA_TO_DEVICE mode; guest
>> accesses them in other DMA modes). Thus those just freed slots may
>> reside in cache. Then reusing those just freed slots can reduce cache
>> misses.
>> 
>> To that end, maintain a free list for free slots and insert freed slots
>> from the head and searching for free slots always starts from the head.
>> 
>> With this optimization, network throughput of sending data from host to
>> guest, measured by iperf3, increases by 7%.
>
>Wow, that is pretty awesome!
>
>Are there any other benchmarks that you ran that showed a negative
>performance?

TBH, yes. Recently I do fio tests with this patch. The impact of this patch
is: (+ means performance improvement; - means performance regression)

1-job fio:
randread: +6.7%
randwrite: -1.6%
read: +8.2%
write: +7.4%

8-job fio:
randread: -5.5%
randwrite: -12.6%
read: -24.8%
write: -45.5%

I haven't figured out why multi-job fio tests suffer. Will post v2 once
the issue gets resolved.

Thanks
Chao