lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <155508166807.23650.1698718263344400653.stgit@firesoul>
Date:   Fri, 12 Apr 2019 17:07:48 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     netdev@...r.kernel.org, Daniel Borkmann <borkmann@...earbox.net>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        "David S. Miller" <davem@...emloft.net>
Cc:     songliubraving@...com,
        Toke Høiland-Jørgensen <toke@...e.dk>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        ecree@...arflare.com, bpf@...r.kernel.org
Subject: [PATCH bpf-next V2 4/4] bpf: cpumap memory prefetchw optimizations
 for struct page

A lot of the performance gain comes from this patch.

While analysing performance overhead it was found that the largest CPU
stalls were caused when touching the struct page area. It is first read with
a READ_ONCE from build_skb_around via page_is_pfmemalloc(), and when freed
written by page_frag_free() call.

Measurements show that the prefetchw (W) variant operation is needed to
achieve the performance gain. We believe this optimization it two fold,
first the W-variant saves one step in the cache-coherency protocol, and
second it helps us to avoid the non-temporal prefetch HW optimizations and
bring this into all cache-levels. It might be worth investigating if
prefetch into L2 will have the same benefit.

Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@...aro.org>
Acked-by: Song Liu <songliubraving@...com>
---
 kernel/bpf/cpumap.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 732d6ced3987..cf727d77c6c6 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -280,6 +280,18 @@ static int cpu_map_kthread_run(void *data)
 		 * consume side valid as no-resize allowed of queue.
 		 */
 		n = ptr_ring_consume_batched(rcpu->queue, frames, CPUMAP_BATCH);
+
+		for (i = 0; i < n; i++) {
+			void *f = frames[i];
+			struct page *page = virt_to_page(f);
+
+			/* Bring struct page memory area to curr CPU. Read by
+			 * build_skb_around via page_is_pfmemalloc(), and when
+			 * freed written by page_frag_free call.
+			 */
+			prefetchw(page);
+		}
+
 		m = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, n, skbs);
 		if (unlikely(m == 0)) {
 			for (i = 0; i < n; i++)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ