[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c58abbec-7220-b440-98d4-d1026a8feed4@huawei.com>
Date: Mon, 8 Mar 2021 16:22:12 +0000
From: John Garry <john.garry@...wei.com>
To: Robin Murphy <robin.murphy@....com>,
"Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>,
Will Deacon <will@...nel.org>, Joerg Roedel <joro@...tes.org>,
iommu <iommu@...ts.linux-foundation.org>,
linux-kernel <linux-kernel@...r.kernel.org>
CC: Vijayanand Jitta <vjitta@...eaurora.org>,
Linuxarm <linuxarm@...wei.com>
Subject: Re: [PATCH 1/1] Revert "iommu/iova: Retry from last rb tree node if
iova search fails"
On 08/03/2021 15:15, Robin Murphy wrote:
>> I figure that you're talking about 4e89dce72521 now. I would have
>> liked to know which real-life problem it solved in practice.
>
> From what I remember, the problem reported was basically the one
> illustrated in that commit and the one I alluded to above - namely that
> certain allocation patterns with a broad mix of sizes and relative
> lifetimes end up pushing the cached PFN down to the bottom of the
> address space such that allocations start failing despite there still
> being sufficient free space overall, which was breaking some media
> workload. What was originally proposed was an overcomplicated palaver
> with DMA attributes and a whole extra allocation algorithm rather than
> just fixing the clearly unintended and broken behaviour.
ok, fine. I just wondered if this was a theoretical problem only.
>
>>> While max32_alloc_size indirectly tracks the largest*contiguous*
>>> available space, one of the ideas from which it grew was to simply keep
>>> count of the total number of free PFNs. If you're really spending
>>> significant time determining that the tree is full, as opposed to just
>>> taking longer to eventually succeed, then it might be relatively
>>> innocuous to tack on that semi-redundant extra accounting as a
>>> self-contained quick fix for that worst case.
>>>
...
>>
>> Even if it is were configurable, wouldn't it make sense to have it
>> configurable per IOVA domain?
>
> Perhaps, but I don't see that being at all easy to implement. We can't
> arbitrarily *increase* the scope of caching once a domain is active due
> to the size-rounding-up requirement, which would be prohibitive to
> larger allocations if applied universally.
>
Agreed.
But having that (all IOVAs sizes being cacheable) available could be
really great, though, for some situations.
>> Furthermore, as mentioned above, I still want to solve this IOVA aging
>> issue, and this fixed RCACHE RANGE size seems to be the at the center
>> of that problem.
>>
>>>
>>>> As for 4e89dce72521, so even if it's proper to retry for a failed
>>>> alloc,
>>>> it is not always necessary. I mean, if we're limiting ourselves to 32b
>>>> subspace for this SAC trick and we fail the alloc, then we can try the
>>>> space above 32b first (if usable). If that fails, then retry there. I
>>>> don't see a need to retry the 32b subspace if we're not limited to it.
>>>> How about it? We tried that idea and it looks to just about restore
>>>> performance.
>>> The thing is, if you do have an actual PCI device where DAC might mean a
>>> 33% throughput loss and you're mapping a long-lived buffer, or you're on
>>> one of these systems where firmware fails to document address limits and
>>> using the full IOMMU address width quietly breaks things, then you
>>> almost certainly*do* want the allocator to actually do a proper job of
>>> trying to satisfy the given request.
>>
>> If those conditions were true, then it seems quite a tenuous position,
>> so trying to help that scenario in general terms will have limited
>> efficacy.
>
> Still, I'd be curious to see if making the restart a bit cleverer offers
> a noticeable improvement. IIRC I suggested it at the time, but in the
> end the push was just to get *something* merged.
Sorry to say, I just tested that ("iommu/iova: Improve restart logic")
and there is no obvious improvement.
I'll have a further look at what might be going on.
Thanks very much,
John
Powered by blists - more mailing lists