linux-kernel - Re: [PATCH 1/1] Revert "iommu/iova: Retry from last rb tree node if iova search fails"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c58abbec-7220-b440-98d4-d1026a8feed4@huawei.com>
Date:   Mon, 8 Mar 2021 16:22:12 +0000
From:   John Garry <john.garry@...wei.com>
To:     Robin Murphy <robin.murphy@....com>,
        "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>,
        Will Deacon <will@...nel.org>, Joerg Roedel <joro@...tes.org>,
        iommu <iommu@...ts.linux-foundation.org>,
        linux-kernel <linux-kernel@...r.kernel.org>
CC:     Vijayanand Jitta <vjitta@...eaurora.org>,
        Linuxarm <linuxarm@...wei.com>
Subject: Re: [PATCH 1/1] Revert "iommu/iova: Retry from last rb tree node if
 iova search fails"

On 08/03/2021 15:15, Robin Murphy wrote:
>> I figure that you're talking about 4e89dce72521 now. I would have 
>> liked to know which real-life problem it solved in practice.
> 
>  From what I remember, the problem reported was basically the one 
> illustrated in that commit and the one I alluded to above - namely that 
> certain allocation patterns with a broad mix of sizes and relative 
> lifetimes end up pushing the cached PFN down to the bottom of the 
> address space such that allocations start failing despite there still 
> being sufficient free space overall, which was breaking some media 
> workload. What was originally proposed was an overcomplicated palaver 
> with DMA attributes and a whole extra allocation algorithm rather than 
> just fixing the clearly unintended and broken behaviour.

ok, fine. I just wondered if this was a theoretical problem only.

> 
>>> While max32_alloc_size indirectly tracks the largest*contiguous* 
>>> available space, one of the ideas from which it grew was to simply keep
>>> count of the total number of free PFNs. If you're really spending
>>> significant time determining that the tree is full, as opposed to just
>>> taking longer to eventually succeed, then it might be relatively
>>> innocuous to tack on that semi-redundant extra accounting as a
>>> self-contained quick fix for that worst case.
>>>

...

>>
>> Even if it is were configurable, wouldn't it make sense to have it 
>> configurable per IOVA domain?
> 
> Perhaps, but I don't see that being at all easy to implement. We can't 
> arbitrarily *increase* the scope of caching once a domain is active due 
> to the size-rounding-up requirement, which would be prohibitive to 
> larger allocations if applied universally.
> 

Agreed.

But having that (all IOVAs sizes being cacheable) available could be 
really great, though, for some situations.

>> Furthermore, as mentioned above, I still want to solve this IOVA aging 
>> issue, and this fixed RCACHE RANGE size seems to be the at the center 
>> of that problem.
>>
>>>
>>>> As for 4e89dce72521, so even if it's proper to retry for a failed 
>>>> alloc,
>>>> it is not always necessary. I mean, if we're limiting ourselves to 32b
>>>> subspace for this SAC trick and we fail the alloc, then we can try the
>>>> space above 32b first (if usable). If that fails, then retry there. I
>>>> don't see a need to retry the 32b subspace if we're not limited to it.
>>>> How about it? We tried that idea and it looks to just about restore
>>>> performance.
>>> The thing is, if you do have an actual PCI device where DAC might mean a
>>> 33% throughput loss and you're mapping a long-lived buffer, or you're on
>>> one of these systems where firmware fails to document address limits and
>>> using the full IOMMU address width quietly breaks things, then you
>>> almost certainly*do*  want the allocator to actually do a proper job of
>>> trying to satisfy the given request.
>>
>> If those conditions were true, then it seems quite a tenuous position, 
>> so trying to help that scenario in general terms will have limited 
>> efficacy.
> 
> Still, I'd be curious to see if making the restart a bit cleverer offers 
> a noticeable improvement. IIRC I suggested it at the time, but in the 
> end the push was just to get *something* merged.

Sorry to say, I just tested that ("iommu/iova: Improve restart logic") 
and there is no obvious improvement.

I'll have a further look at what might be going on.

Thanks very much,
John