linux-kernel - Re: [PATCH 0/4] Optimise 64-bit IOVA allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Fri, 21 Jul 2017 17:48:25 +0800
From:   "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>
To:     Robin Murphy <robin.murphy@....com>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>
CC:     Joerg Roedel <joro@...tes.org>, <iommu@...ts.linux-foundation.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        David Woodhouse <dwmw2@...radead.org>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        <Jonathan.Cameron@...wei.com>, <nwatters@...eaurora.org>,
        <ray.jui@...adcom.com>
Subject: Re: [PATCH 0/4] Optimise 64-bit IOVA allocations



On 2017/7/19 18:23, Robin Murphy wrote:
> On 19/07/17 09:37, Ard Biesheuvel wrote:
>> On 18 July 2017 at 17:57, Robin Murphy <robin.murphy@....com> wrote:
>>> Hi all,
>>>
>>> In the wake of the ARM SMMU optimisation efforts, it seems that certain
>>> workloads (e.g. storage I/O with large scatterlists) probably remain quite
>>> heavily influenced by IOVA allocation performance. Separately, Ard also
>>> reported massive performance drops for a graphical desktop on AMD Seattle
>>> when enabling SMMUs via IORT, which we traced to dma_32bit_pfn in the DMA
>>> ops domain getting initialised differently for ACPI vs. DT, and exposing
>>> the overhead of the rbtree slow path. Whilst we could go around trying to
>>> close up all the little gaps that lead to hitting the slowest case, it
>>> seems a much better idea to simply make said slowest case a lot less slow.
>>>
>>> I had a go at rebasing Leizhen's last IOVA series[1], but ended up finding
>>> the changes rather too hard to follow, so I've taken the liberty here of
>>> picking the whole thing up and reimplementing the main part in a rather
>>> less invasive manner.
>>>
>>> Robin.
>>>
>>> [1] https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg17753.html
>>>
>>> Robin Murphy (1):
>>>   iommu/iova: Extend rbtree node caching
>>>
>>> Zhen Lei (3):
>>>   iommu/iova: Optimise rbtree searching
>>>   iommu/iova: Optimise the padding calculation
>>>   iommu/iova: Make dma_32bit_pfn implicit
>>>
>>>  drivers/gpu/drm/tegra/drm.c      |   3 +-
>>>  drivers/gpu/host1x/dev.c         |   3 +-
>>>  drivers/iommu/amd_iommu.c        |   7 +--
>>>  drivers/iommu/dma-iommu.c        |  18 +------
>>>  drivers/iommu/intel-iommu.c      |  11 ++--
>>>  drivers/iommu/iova.c             | 112 ++++++++++++++++-----------------------
>>>  drivers/misc/mic/scif/scif_rma.c |   3 +-
>>>  include/linux/iova.h             |   8 +--
>>>  8 files changed, 60 insertions(+), 105 deletions(-)
>>>
>>
>> These patches look suspiciously like the ones I have been using over
>> the past couple of weeks (modulo the tegra and host1x changes) from
>> your git tree. They work fine on my AMD Overdrive B1, both in DT and
>> in ACPI/IORT modes, although it is difficult to quantify any
>> performance deltas on my setup.
> 
> Indeed - this is a rebase (to account for those new callers) with a
> couple of trivial tweaks to error paths and corner cases that normal
> usage shouldn't have been hitting anyway. "No longer unusably awful" is
> a good enough performance delta for me :)
> 
>> Tested-by: Ard Biesheuvel <ard.biesheuvel@...aro.org>
I got the same performance data compared with my patch version. It works well.

Tested-by: Zhen Lei <thunder.leizhen@...wei.com>

> 
> Thanks!
> 
> Robin.
> 
> .
> 

-- 
Thanks!
BestRegards