[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250925085154.GW836419@horms.kernel.org>
Date: Thu, 25 Sep 2025 09:51:54 +0100
From: Simon Horman <horms@...nel.org>
To: Théo Lebrun <theo.lebrun@...tlin.com>
Cc: Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Conor Dooley <conor+dt@...nel.org>,
Nicolas Ferre <nicolas.ferre@...rochip.com>,
Claudiu Beznea <claudiu.beznea@...on.dev>,
Geert Uytterhoeven <geert@...ux-m68k.org>,
Harini Katakam <harini.katakam@...inx.com>,
Richard Cochran <richardcochran@...il.com>,
Russell King <linux@...linux.org.uk>, netdev@...r.kernel.org,
devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
Tawfik Bayouk <tawfik.bayouk@...ileye.com>,
Sean Anderson <sean.anderson@...ux.dev>
Subject: Re: [PATCH net v6 4/5] net: macb: single dma_alloc_coherent() for
DMA descriptors
On Tue, Sep 23, 2025 at 06:00:26PM +0200, Théo Lebrun wrote:
> Move from 2*NUM_QUEUES dma_alloc_coherent() for DMA descriptor rings to
> 2 calls overall.
>
> Issue is with how all queues share the same register for configuring the
> upper 32-bits of Tx/Rx descriptor rings. Taking Tx, notice how TBQPH
> does *not* depend on the queue index:
>
> #define GEM_TBQP(hw_q) (0x0440 + ((hw_q) << 2))
> #define GEM_TBQPH(hw_q) (0x04C8)
>
> queue_writel(queue, TBQP, lower_32_bits(queue->tx_ring_dma));
> #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> if (bp->hw_dma_cap & HW_DMA_CAP_64B)
> queue_writel(queue, TBQPH, upper_32_bits(queue->tx_ring_dma));
> #endif
>
> To maximise our chances of getting valid DMA addresses, we do a single
> dma_alloc_coherent() across queues. This improves the odds because
> alloc_pages() guarantees natural alignment. Other codepaths (IOMMU or
> dev/arch dma_map_ops) don't give high enough guarantees
> (even page-aligned isn't enough).
>
> Two consideration:
>
> - dma_alloc_coherent() gives us page alignment. Here we remove this
> constraint meaning each queue's ring won't be page-aligned anymore.
>
> - This can save some tiny amounts of memory. Fewer allocations means
> (1) less overhead (constant cost per alloc) and (2) less wasted bytes
> due to alignment constraints.
>
> Example for (2): 4 queues, default ring size (512), 64-bit DMA
> descriptors, 16K pages:
> - Before: 8 allocs of 8K, each rounded to 16K => 64K wasted.
> - After: 2 allocs of 32K => 0K wasted.
>
> Fixes: 02c958dd3446 ("net/macb: add TX multiqueue support for gem")
> Reviewed-by: Sean Anderson <sean.anderson@...ux.dev>
> Acked-by: Nicolas Ferre <nicolas.ferre@...rochip.com>
> Tested-by: Nicolas Ferre <nicolas.ferre@...rochip.com> # on sam9x75
> Signed-off-by: Théo Lebrun <theo.lebrun@...tlin.com>
Reviewed-by: Simon Horman <horms@...nel.org>
Powered by blists - more mailing lists