[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9aba4d8b-6a46-6c08-2568-2e7490723526@oracle.com>
Date: Mon, 21 Aug 2023 12:35:14 +0100
From: John Garry <john.g.garry@...cle.com>
To: Robin Murphy <robin.murphy@....com>, joro@...tes.org
Cc: will@...nel.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, zhangzekun11@...wei.com
Subject: Re: [PATCH 0/2] iommu/iova: Make the rcache depot properly flexible
On 16/08/2023 16:10, Robin Murphy wrote:
> On 15/08/2023 2:35 pm, John Garry wrote:
>> On 15/08/2023 12:11, Robin Murphy wrote:
>>>>
>>>> This threshold is the number of online CPUs, right?
>>>
>>> Yes, that's nominally half of the current fixed size (based on all
>>> the performance figures from the original series seemingly coming
>>> from a 16-thread machine,
>>
>> If you are talking about
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-iommu/20230811130246.42719-1-zhangzekun11@huawei.com/__;!!ACWV5N9M2RV99hQ!Op6GUnd7phh1sFyJwVOngmoeyKKbHWbSsNkhPB_7BpG45JFOHmN0HQ0Y7NOZZQ7VduKXaRYCXTta8LjrS99neyg$ ,
>
> No, I mean the *original* rcache patch submission, and its associated
> paper:
>
> https://urldefense.com/v3/__https://lore.kernel.org/linux-iommu/cover.1461135861.git.mad@cs.technion.ac.il/__;!!ACWV5N9M2RV99hQ!Op6GUnd7phh1sFyJwVOngmoeyKKbHWbSsNkhPB_7BpG45JFOHmN0HQ0Y7NOZZQ7VduKXaRYCXTta8LjrOGggfnA$
oh, that one :)
>> then I think it's a 256-CPU system and the DMA controller has 16 HW
>> queues. The 16 HW queues are relevant as the per-completion queue
>> interrupt handler runs on a fixed CPU from the set of 16 CPUs in the
>> HW queue interrupt handler affinity mask. And what this means is while
>> any CPU may alloc an IOVA, only those 16 CPUs handling each HW queue
>> interrupt will be free'ing IOVAs.
>>
>>> but seemed like a fair compromise. I am of course keen to see how
>>> real-world testing actually pans out.
>>>
>>>>> it's enough of a challenge to get my 4-core dev board with spinning
>>>>> disk
>>>>> and gigabit ethernet to push anything into a depot at all 😄
>>>>>
>>>>
>>>> I have to admit that I was hoping to also see a more aggressive
>>>> reclaim strategy, where we also trim the per-CPU rcaches when not in
>>>> use. Leizhen proposed something like this a long time ago.
>>>
>>> Don't think I haven't been having various elaborate ideas for making
>>> it cleverer with multiple thresholds and self-tuning, however I have
>>> managed to restrain myself 😉
>>>
>>
>> OK, understood. My main issue WRT scalability is that the total
>> cacheable IOVAs (CPU and depot rcache) scales up with the number of
>> CPUs, but many DMA controllers have a fixed number of max in-flight
>> requests.
>>
>> Consider a SCSI storage controller on a 256-CPU system. The in-flight
>> limit for this example controller is 4096, which would typically never
>> be even used up or may not be even usable.
>>
>> For this device, we need 4096 * 6 [IOVA rcache range] = ~24K cached
>> IOVAs if we were to pre-allocate them all - obviously I am ignoring
>> that we have the per-CPU rcache for speed and it would not make sense
>> to share one set. However, according to current IOVA driver, we can in
>> theory cache upto ((256 [CPUs] * 2 [loaded + prev]) + 32 [depot size])
>> * 6 [rcache range] * 128 (IOVA per mag) = ~420K IOVAs. That's ~17x
>> what we would ever need.
>>
>> Something like NVMe is different, as its total requests can scale up
>> with the CPU count, but only to a limit. I am not sure about network
>> controllers.
>
> Remember that this threshold only represents a point at which we
> consider the cache to have grown "big enough" to start background
> reclaim - over the short term it is neither an upper nor a lower limit
> on the cache capacity itself. Indeed it will be larger than the working
> set of some workloads, but then it still wants to be enough of a buffer
> to be useful for others which do make big bursts of allocations only
> periodically.
>
It would be interesting to see what zhangzekun finds for this series. He
was testing on a 5.10-based kernel - things have changed a lot since
then and I am not really sure what the problem could have been there.
>> Anyway, this is just something which I think should be considered -
>> which I guess already has been.
>
> Indeed, I would tend to assume that machines with hundreds of CPUs are
> less likely to be constrained on overall memory and/or IOVA space,
Cheers,
John
Powered by blists - more mailing lists