linux-kernel - Re: [PATCH v4] iommu: Optimise PCI SAC address trick

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f17ab40c-d5c6-e31b-7877-7452b612505c@oracle.com>
Date:   Thu, 15 Jun 2023 13:15:07 +0100
From:   John Garry <john.g.garry@...cle.com>
To:     Robin Murphy <robin.murphy@....com>,
        Jakub Kicinski <kuba@...nel.org>,
        Joerg Roedel <joro@...tes.org>
Cc:     will@...nel.org, iommu@...ts.linux.dev,
        linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v4] iommu: Optimise PCI SAC address trick

On 15/06/2023 12:41, Robin Murphy wrote:
>>
>> Sure, not the same problem.
>>
>> However when we switched storage drivers to use dma_opt_mapping_size() 
>> then performance is similar to iommu.forcedac=1 - that's what I found, 
>> anyway.
>>
>> This tells me that that even though IOVA allocator performance is poor 
>> when the 32b space fills, it was those large IOVAs which don't fit in 
>> the rcache which were the major contributor to hogging the CPU in the 
>> allocator.
> 
> The root cause is that every time the last usable 32-bit IOVA is 
> allocated, the *next* PCI caller to hit the rbtree for a SAC allocation 
> is burdened with walking the whole 32-bit subtree to determine that it's 
> full again and re-set max32_alloc_size. That's the overhead that 
> forcedac avoids.
> 

Sure

> In the storage case with larger buffers, dma_opt_mapping_size() also 
> means you spend less time in the rbtree, but because you're inherently 
> hitting it less often at all, since most allocations can now hopefully 
> be fulfilled by the caches.

Sure

> That's obviously moot when the mappings are 
> already small enough to be cached and the only reason for hitting the 
> rbtree is overflow/underflow in the depot because the working set is 
> sufficiently large and the allocation pattern sufficiently "bursty".

After a bit of checking, this is same issue 
https://lore.kernel.org/linux-iommu/20230329181407.3eed7378@kernel.org/, 
and indeed we would always be using rcache'able-sized mappings.

So you think that we are reaching the depot full issue when we start to 
free depot magazines in __iova_rcache_insert(), right? From my 
experience in storage testing, it takes a long time to get to this state.

Thanks,
John