lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 28 Apr 2022 17:59:11 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Andi Kleen <ak@...ux.intel.com>,
        Christoph Hellwig <hch@...radead.org>
Cc:     parri.andrea@...il.com, thomas.lendacky@....com,
        wei.liu@...nel.org, Tianyu Lan <Tianyu.Lan@...rosoft.com>,
        konrad.wilk@...cle.com, linux-hyperv@...r.kernel.org,
        Tianyu Lan <ltykernel@...il.com>, linux-kernel@...r.kernel.org,
        michael.h.kelley@...rosoft.com, iommu@...ts.linux-foundation.org,
        andi.kleen@...el.com, brijesh.singh@....com, vkuznets@...hat.com,
        kys@...rosoft.com, kirill.shutemov@...el.com, hch@....de
Subject: Re: [RFC PATCH 1/2] swiotlb: Split up single swiotlb lock

On 2022-04-28 17:02, Andi Kleen wrote:
> 
> On 4/28/2022 8:07 AM, Robin Murphy wrote:
>> On 2022-04-28 15:55, Andi Kleen wrote:
>>>
>>> On 4/28/2022 7:45 AM, Christoph Hellwig wrote:
>>>> On Thu, Apr 28, 2022 at 03:44:36PM +0100, Robin Murphy wrote:
>>>>> Rather than introduce this extra level of allocator complexity, how 
>>>>> about
>>>>> just dividing up the initial SWIOTLB allocation into multiple 
>>>>> io_tlb_mem
>>>>> instances?
>>>> Yeah.  We're almost done removing all knowledge of swiotlb from 
>>>> drivers,
>>>> so the very last thing I want is an interface that allows a driver to
>>>> allocate a per-device buffer.
>>>
>>> At least for TDX need parallelism with a single device for performance.
>>>
>>> So if you split up the io tlb mems for a device then you would need a 
>>> new mechanism to load balance the requests for single device over 
>>> those. I doubt it would be any simpler.
>>
>> Eh, I think it would be, since the round-robin retry loop can then 
>> just sit around the existing io_tlb_mem-based allocator, vs. the churn 
>> of inserting it in the middle, plus it's then really easy to 
>> statically distribute different starting points across different 
>> devices via dev->dma_io_tlb_mem if we wanted to.
>>
>> Admittedly the overall patch probably ends up about the same size, 
>> since it likely pushes a bit more complexity into swiotlb_init to 
>> compensate, but that's still a trade-off I like.
> 
> Unless you completely break the external API this will require a new 
> mechanism to search a list of io_tlb_mems for the right area to free into.
> 
> If the memory area not contiguous (like in the original patch) this will 
> be a O(n) operation on the number of io_tlb_mems, so it would get more 
> and more expensive on larger systems. Or you merge them all together (so 
> that the simple address arithmetic to look up the area works again), 
> which will require even more changes in the setup. Or you add hashing or 
> similar which will be even more complicated.
> 
> In the end doing it with a single io_tlb_mem is significantly simpler 
> and also more natural.

Sorry if "dividing up the initial SWIOTLB allocation" somehow sounded 
like "making multiple separate SWIOTLB allocations all over the place"?

I don't see there being any *functional* difference in whether a slice 
of the overall SWIOTLB memory is represented by 
"io_tlb_default_mem->areas[i]->blah" or "io_tlb_default_mem[i]->blah", 
I'm simply advocating for not churning the already-complex allocator 
internals by pushing the new complexity out to the margins instead.

Thanks,
Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ