[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3686314d-4226-b360-a72d-267af52c8918@arm.com>
Date: Thu, 28 Apr 2022 17:59:11 +0100
From: Robin Murphy <robin.murphy@....com>
To: Andi Kleen <ak@...ux.intel.com>,
Christoph Hellwig <hch@...radead.org>
Cc: parri.andrea@...il.com, thomas.lendacky@....com,
wei.liu@...nel.org, Tianyu Lan <Tianyu.Lan@...rosoft.com>,
konrad.wilk@...cle.com, linux-hyperv@...r.kernel.org,
Tianyu Lan <ltykernel@...il.com>, linux-kernel@...r.kernel.org,
michael.h.kelley@...rosoft.com, iommu@...ts.linux-foundation.org,
andi.kleen@...el.com, brijesh.singh@....com, vkuznets@...hat.com,
kys@...rosoft.com, kirill.shutemov@...el.com, hch@....de
Subject: Re: [RFC PATCH 1/2] swiotlb: Split up single swiotlb lock
On 2022-04-28 17:02, Andi Kleen wrote:
>
> On 4/28/2022 8:07 AM, Robin Murphy wrote:
>> On 2022-04-28 15:55, Andi Kleen wrote:
>>>
>>> On 4/28/2022 7:45 AM, Christoph Hellwig wrote:
>>>> On Thu, Apr 28, 2022 at 03:44:36PM +0100, Robin Murphy wrote:
>>>>> Rather than introduce this extra level of allocator complexity, how
>>>>> about
>>>>> just dividing up the initial SWIOTLB allocation into multiple
>>>>> io_tlb_mem
>>>>> instances?
>>>> Yeah. We're almost done removing all knowledge of swiotlb from
>>>> drivers,
>>>> so the very last thing I want is an interface that allows a driver to
>>>> allocate a per-device buffer.
>>>
>>> At least for TDX need parallelism with a single device for performance.
>>>
>>> So if you split up the io tlb mems for a device then you would need a
>>> new mechanism to load balance the requests for single device over
>>> those. I doubt it would be any simpler.
>>
>> Eh, I think it would be, since the round-robin retry loop can then
>> just sit around the existing io_tlb_mem-based allocator, vs. the churn
>> of inserting it in the middle, plus it's then really easy to
>> statically distribute different starting points across different
>> devices via dev->dma_io_tlb_mem if we wanted to.
>>
>> Admittedly the overall patch probably ends up about the same size,
>> since it likely pushes a bit more complexity into swiotlb_init to
>> compensate, but that's still a trade-off I like.
>
> Unless you completely break the external API this will require a new
> mechanism to search a list of io_tlb_mems for the right area to free into.
>
> If the memory area not contiguous (like in the original patch) this will
> be a O(n) operation on the number of io_tlb_mems, so it would get more
> and more expensive on larger systems. Or you merge them all together (so
> that the simple address arithmetic to look up the area works again),
> which will require even more changes in the setup. Or you add hashing or
> similar which will be even more complicated.
>
> In the end doing it with a single io_tlb_mem is significantly simpler
> and also more natural.
Sorry if "dividing up the initial SWIOTLB allocation" somehow sounded
like "making multiple separate SWIOTLB allocations all over the place"?
I don't see there being any *functional* difference in whether a slice
of the overall SWIOTLB memory is represented by
"io_tlb_default_mem->areas[i]->blah" or "io_tlb_default_mem[i]->blah",
I'm simply advocating for not churning the already-complex allocator
internals by pushing the new complexity out to the margins instead.
Thanks,
Robin.
Powered by blists - more mailing lists