[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ac0a09f2-5f05-4317-a1cf-b7135791c639@arm.com>
Date: Thu, 2 May 2024 16:49:48 +0100
From: Robin Murphy <robin.murphy@....com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: Christoph Hellwig <hch@....de>,
Marek Szyprowski <m.szyprowski@...sung.com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>,
nex.sw.ncis.osdt.itp.upstreaming@...el.com, bpf@...r.kernel.org,
netdev@...r.kernel.org, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v4 6/7] page_pool: check for DMA sync shortcut
earlier
On 24/04/2024 9:52 am, Alexander Lobakin wrote:
> From: Alexander Lobakin <aleksander.lobakin@...el.com>
> Date: Tue, 23 Apr 2024 15:58:31 +0200
>
>> We can save a couple more function calls in the Page Pool code if we
>> check for dma_need_sync() earlier, just when we test pp->p.dma_sync.
>> Move both these checks into an inline wrapper and call the PP wrapper
>> over the generic DMA sync function only when both are true.
>> You can't cache the result of dma_need_sync() in &page_pool, as it may
>> change anytime if an SWIOTLB buffer is allocated or mapped.
>>
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@...el.com>
>> ---
>> net/core/page_pool.c | 31 +++++++++++++++++--------------
>> 1 file changed, 17 insertions(+), 14 deletions(-)
>>
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index 6cf26a68fa91..87319c6365e0 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -398,16 +398,24 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
>> return page;
>> }
>>
>> -static void page_pool_dma_sync_for_device(const struct page_pool *pool,
>> - const struct page *page,
>> - unsigned int dma_sync_size)
>> +static void __page_pool_dma_sync_for_device(const struct page_pool *pool,
>> + const struct page *page,
>> + u32 dma_sync_size)
>> {
>> dma_addr_t dma_addr = page_pool_get_dma_addr(page);
>>
>> dma_sync_size = min(dma_sync_size, pool->p.max_len);
>> - dma_sync_single_range_for_device(pool->p.dev, dma_addr,
>> - pool->p.offset, dma_sync_size,
>> - pool->p.dma_dir);
>> + __dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset,
>> + dma_sync_size, pool->p.dma_dir);
>
> Breh, this breaks builds on !DMA_NEED_SYNC systems, as this function
> isn't defined there, and my CI didn't catch it in time... I'll ifdef
> this in the next spin after some reviews for this one.
Hmm, the way things have ended up, do we even need this change? (Other
than factoring out the pool->dma_sync check, which is clearly nice)
Since __page_pool_dma_sync_for_device() is never called directly, the
flow we always get is:
page_pool_dma_sync_for_device()
dma_dev_need_sync()
__page_pool_dma_sync_for_device()
... // a handful of trivial arithmetic
__dma_sync_single_for_device()
i.e. still effectively the same order of
"if (dma_dev_need_sync()) __dma_sync()" invocations as if we'd just used
the standard dma_sync(), so referencing the unwrapped internals only
spreads it out a bit more for no real benefit. As an experiment I tried
the diff below on top, effectively undoing this problematic part, and it
doesn't seem to make any appreciable difference in a before-and-after
comparison of the object code, at least for my arm64 build.
Thanks,
Robin.
----->8-----
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 27f3b6db800e..b8ab7d4ca229 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -398,24 +398,20 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
return page;
}
-static void __page_pool_dma_sync_for_device(const struct page_pool *pool,
- const struct page *page,
- u32 dma_sync_size)
-{
- dma_addr_t dma_addr = page_pool_get_dma_addr(page);
-
- dma_sync_size = min(dma_sync_size, pool->p.max_len);
- __dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset,
- dma_sync_size, pool->p.dma_dir);
-}
-
static __always_inline void
page_pool_dma_sync_for_device(const struct page_pool *pool,
const struct page *page,
u32 dma_sync_size)
{
- if (pool->dma_sync && dma_dev_need_sync(pool->p.dev))
- __page_pool_dma_sync_for_device(pool, page, dma_sync_size);
+ dma_addr_t dma_addr = page_pool_get_dma_addr(page);
+
+ if (!pool->dma_sync)
+ return;
+
+ dma_sync_size = min(dma_sync_size, pool->p.max_len);
+ dma_sync_single_range_for_device(pool->p.dev, dma_addr,
+ pool->p.offset, dma_sync_size,
+ pool->p.dma_dir);
}
static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
Powered by blists - more mailing lists