lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 22 Mar 2021 18:30:55 +0000
From:   Alexander Lobakin <alobakin@...me>
To:     "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     Jesper Dangaard Brouer <hawk@...nel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Matteo Croce <mcroce@...ux.microsoft.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Alexander Lobakin <alobakin@...me>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: [PATCH net-next] page_pool: let the compiler optimize and inline core functions

As per disscussion in Page Pool bulk allocator thread [0],
there are two functions in Page Pool core code that are marked as
'noinline'. The reason for this is not so clear, and even if it
was made to reduce hotpath overhead, in fact it only makes things
worse.
As both of these functions as being called only once through the
code, they could be inlined/folded into the non-static entry point.
However, 'noinline' marks effectively prevent from doing that and
induce totally unneeded fragmentation (baseline -> after removal):

add/remove: 0/3 grow/shrink: 1/0 up/down: 1024/-1096 (-72)
Function                                     old     new   delta
page_pool_alloc_pages                        100    1124   +1024
page_pool_dma_map                            164       -    -164
page_pool_refill_alloc_cache                 332       -    -332
__page_pool_alloc_pages_slow                 600       -    -600

(taken from Mel's branch, hence factored-out page_pool_dma_map())

1124 is a normal hotpath frame size, but these jumps between tiny
page_pool_alloc_pages(), page_pool_refill_alloc_cache() and
__page_pool_alloc_pages_slow() are really redundant and harmful
for performance.

This simple removal of 'noinline' keywords bumps the throughput
on XDP_PASS + napi_build_skb() + napi_gro_receive() on 25+ Mbps
for 1G embedded NIC.

[0] https://lore.kernel.org/netdev/20210317222506.1266004-1-alobakin@pm.me

Signed-off-by: Alexander Lobakin <alobakin@...me>
---
 net/core/page_pool.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ad8b0707af04..589e4df6ef2b 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -102,7 +102,6 @@ EXPORT_SYMBOL(page_pool_create);

 static void page_pool_return_page(struct page_pool *pool, struct page *page);

-noinline
 static struct page *page_pool_refill_alloc_cache(struct page_pool *pool)
 {
 	struct ptr_ring *r = &pool->ring;
@@ -181,7 +180,6 @@ static void page_pool_dma_sync_for_device(struct page_pool *pool,
 }

 /* slow path */
-noinline
 static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 						 gfp_t _gfp)
 {
--
2.31.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ