[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z93AoLeUSUY6dj4U@pollux>
Date: Fri, 21 Mar 2025 20:40:16 +0100
From: Danilo Krummrich <dakr@...nel.org>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Abdiel Janulgue <abdiel.janulgue@...il.com>,
rust-for-linux@...r.kernel.org, daniel.almeida@...labora.com,
robin.murphy@....com, aliceryhl@...gle.com,
Miguel Ojeda <ojeda@...nel.org>,
Alex Gaynor <alex.gaynor@...il.com>,
Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Benno Lossin <benno.lossin@...ton.me>,
Andreas Hindborg <a.hindborg@...nel.org>,
Trevor Gross <tmgross@...ch.edu>,
Valentin Obst <kernel@...entinobst.de>,
open list <linux-kernel@...r.kernel.org>,
Christoph Hellwig <hch@....de>,
Marek Szyprowski <m.szyprowski@...sung.com>, airlied@...hat.com,
"open list:DMA MAPPING HELPERS" <iommu@...ts.linux.dev>
Subject: Re: [PATCH v14 02/11] rust: add dma coherent allocator abstraction.
On Fri, Mar 21, 2025 at 03:25:39PM -0300, Jason Gunthorpe wrote:
> On Tue, Mar 11, 2025 at 07:47:58PM +0200, Abdiel Janulgue wrote:
> > +pub struct CoherentAllocation<T: AsBytes + FromBytes> {
> > + dev: ARef<Device>,
> > + dma_handle: bindings::dma_addr_t,
> > + count: usize,
> > + cpu_addr: *mut T,
> > + dma_attrs: Attrs,
> > +}
>
> I'd like to point out how memory wasteful this is from what real
> drivers are doing today when they use the coherent API. Let's compare
> against SMMUv3's use for the CD table..
>
> This would be the code in arm_smmu_alloc_cd_ptr()
>
> It is making a 2 level radix tree.
>
> The cpu_addr is stored in a linear array of pointers:
>
> struct arm_smmu_cdtab_l2 **l2ptrs;
>
> The dma_addr is encoded into the HW data structure itself:
>
> arm_smmu_write_cd_l1_desc(&cd_table->l2.l1tab[idx],
> l2ptr_dma);
>
> The size of the allocation is fixed size:
> *l2ptr = dma_alloc_coherent(smmu->dev, sizeof(**l2ptr),
> ^^^^^^^^^^^^
> &l2ptr_dma, GFP_KERNEL);
>
> It doesn't need a struct device pointer or reference because this uses
> the usual kernel 'fence' reasoning for destruction.
>
> It doesn't even use dma_attrs. (why is this in a long term struct?)
>
> So, smmu manages to do this with a single array of 8 bytes/entry to shadow
> the CPU pointer, and recovers the dma_addr from the HW data structure:
>
> dma_free_coherent(smmu->dev,
> sizeof(*cd_table->l2.l2ptrs[i]),
> cd_table->l2.l2ptrs[i],
> arm_smmu_cd_l1_get_desc(&cd_table->l2.l1tab[i]));
>
> Basically, it was designed to be very memory efficient.
>
> If we imagine driving the same HW in rust the array storing the CPU
> pointer would have to expand to 40 bytes/entry to hold every
> CoherentAllocation. This means rust would need a new high order memory
> allocation to hold the CoherentAllocation memory array!
One solution that comes immediately to my mind is to have something like a
CoherentAllocationPool that holds all the repeating metadata and provides an API
to give you continuous allocations of the same kind without any bloat.
One thing I'd like to note though is that we can't have a perfect solutions for
everything from the get-go. We have to start somewhere and work it out
continuously and keep improving things as required.
So, don't get me wrong, I really think you have a valid point with that. But we
also can't consider every use-case from the get-go, push a huge amount of code
and then get questioned on why there's no user for all this. :) IMHO, it's about
finding a good balance.
Powered by blists - more mailing lists