netdev - Re: [PATCH] net: filter: Convert the BPF VM to threaded code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1312005899.2873.70.camel@edumazet-laptop>
Date:	Sat, 30 Jul 2011 08:04:59 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Rui Ueyama <rui314@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: [PATCH] net: filter: Convert the BPF VM to threaded code

Le vendredi 29 juillet 2011 à 22:09 -0700, Rui Ueyama a écrit :
> The result of benchmark looks good. A simple benchmark that sends 10M UDP
> packets to lo took 76.24 seconds on average on Core 2 Duo L7500@...GHz.when
> tcpdump is running. With this patch it took 75.41 seconds, which means we save
> 80ns for each packet on that processor.
> 
> I think converting the VM to threaded code is low hanging fruit, even
> if we'd have
> JIT compilers for popular architectures. Most of the lines in my patch
> are indentation
> change, so the actual change is not big.
> 
...
> Tcpdump I used is this: tcpdump -p -n -s -i lo net 192.168.2.0/24
> 

Thanks for providing numbers. Was it on 32 or 64bit kernel ?

Have you done a test with a cold instruction cache ?

Your patch adds 540 bytes of code, so its a potential latency increase.

# size net/core/filter.o net/core/filter.o.old
   text	   data	    bss	    dec	    hex	filename
   4243	      0	      0	   4243	   1093	net/core/filter.o
   3703	     24	      0	   3727	    e8f	net/core/filter.o.old

Each 'NEXT' translates to :

 4db:	83 c3 08             	add    $0x8,%ebx
 4de:	0f b7 03             	movzwl (%ebx),%eax
 4e1:	8b 04 85 00 02 00 00 	mov    0x200(,%eax,4),%eax
 4e8:	ff e0                	jmp    *%eax


And this is on i386, expect more on cpus with 32bit fixed
instructions ...

We can remove one branch per BPF instruction with following patch :

diff --git a/net/core/filter.c b/net/core/filter.c
index 36f975f..377f3ca 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -119,16 +119,14 @@ unsigned int sk_run_filter(const struct sk_buff *skb,
 	u32 tmp;
 	int k;
 
+	fentry--;
 	/*
 	 * Process array of filter instructions.
 	 */
-	for (;; fentry++) {
-#if defined(CONFIG_X86_32)
+	for (;;) {
 #define	K (fentry->k)
-#else
-		const u32 K = fentry->k;
-#endif
 
+		fentry++;
 		switch (fentry->code) {
 		case BPF_S_ALU_ADD_X:
 			A += X;



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html