lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ac0a09f2-5f05-4317-a1cf-b7135791c639@arm.com>
Date: Thu, 2 May 2024 16:49:48 +0100
From: Robin Murphy <robin.murphy@....com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: Christoph Hellwig <hch@....de>,
 Marek Szyprowski <m.szyprowski@...sung.com>, Joerg Roedel <joro@...tes.org>,
 Will Deacon <will@...nel.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
 Magnus Karlsson <magnus.karlsson@...el.com>,
 nex.sw.ncis.osdt.itp.upstreaming@...el.com, bpf@...r.kernel.org,
 netdev@...r.kernel.org, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v4 6/7] page_pool: check for DMA sync shortcut
 earlier

On 24/04/2024 9:52 am, Alexander Lobakin wrote:
> From: Alexander Lobakin <aleksander.lobakin@...el.com>
> Date: Tue, 23 Apr 2024 15:58:31 +0200
> 
>> We can save a couple more function calls in the Page Pool code if we
>> check for dma_need_sync() earlier, just when we test pp->p.dma_sync.
>> Move both these checks into an inline wrapper and call the PP wrapper
>> over the generic DMA sync function only when both are true.
>> You can't cache the result of dma_need_sync() in &page_pool, as it may
>> change anytime if an SWIOTLB buffer is allocated or mapped.
>>
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@...el.com>
>> ---
>>   net/core/page_pool.c | 31 +++++++++++++++++--------------
>>   1 file changed, 17 insertions(+), 14 deletions(-)
>>
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index 6cf26a68fa91..87319c6365e0 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -398,16 +398,24 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
>>   	return page;
>>   }
>>   
>> -static void page_pool_dma_sync_for_device(const struct page_pool *pool,
>> -					  const struct page *page,
>> -					  unsigned int dma_sync_size)
>> +static void __page_pool_dma_sync_for_device(const struct page_pool *pool,
>> +					    const struct page *page,
>> +					    u32 dma_sync_size)
>>   {
>>   	dma_addr_t dma_addr = page_pool_get_dma_addr(page);
>>   
>>   	dma_sync_size = min(dma_sync_size, pool->p.max_len);
>> -	dma_sync_single_range_for_device(pool->p.dev, dma_addr,
>> -					 pool->p.offset, dma_sync_size,
>> -					 pool->p.dma_dir);
>> +	__dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset,
>> +				     dma_sync_size, pool->p.dma_dir);
> 
> Breh, this breaks builds on !DMA_NEED_SYNC systems, as this function
> isn't defined there, and my CI didn't catch it in time... I'll ifdef
> this in the next spin after some reviews for this one.

Hmm, the way things have ended up, do we even need this change? (Other
than factoring out the pool->dma_sync check, which is clearly nice)

Since __page_pool_dma_sync_for_device() is never called directly, the
flow we always get is:

page_pool_dma_sync_for_device()
     dma_dev_need_sync()
         __page_pool_dma_sync_for_device()
             ... // a handful of trivial arithmetic
             __dma_sync_single_for_device()

i.e. still effectively the same order of
"if (dma_dev_need_sync()) __dma_sync()" invocations as if we'd just used
the standard dma_sync(), so referencing the unwrapped internals only
spreads it out a bit more for no real benefit. As an experiment I tried
the diff below on top, effectively undoing this problematic part, and it
doesn't seem to make any appreciable difference in a before-and-after
comparison of the object code, at least for my arm64 build.

Thanks,
Robin.

----->8-----
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 27f3b6db800e..b8ab7d4ca229 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -398,24 +398,20 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
  	return page;
  }
  
-static void __page_pool_dma_sync_for_device(const struct page_pool *pool,
-					    const struct page *page,
-					    u32 dma_sync_size)
-{
-	dma_addr_t dma_addr = page_pool_get_dma_addr(page);
-
-	dma_sync_size = min(dma_sync_size, pool->p.max_len);
-	__dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset,
-				     dma_sync_size, pool->p.dma_dir);
-}
-
  static __always_inline void
  page_pool_dma_sync_for_device(const struct page_pool *pool,
  			      const struct page *page,
  			      u32 dma_sync_size)
  {
-	if (pool->dma_sync && dma_dev_need_sync(pool->p.dev))
-		__page_pool_dma_sync_for_device(pool, page, dma_sync_size);
+	dma_addr_t dma_addr = page_pool_get_dma_addr(page);
+
+	if (!pool->dma_sync)
+		return;
+
+	dma_sync_size = min(dma_sync_size, pool->p.max_len);
+	dma_sync_single_range_for_device(pool->p.dev, dma_addr,
+					 pool->p.offset, dma_sync_size,
+					 pool->p.dma_dir);
  }
  
  static bool page_pool_dma_map(struct page_pool *pool, struct page *page)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ