lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 29 Jul 2011 22:09:36 -0700
From:	Rui Ueyama <rui314@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: [PATCH] net: filter: Convert the BPF VM to threaded code

The result of benchmark looks good. A simple benchmark that sends 10M UDP
packets to lo took 76.24 seconds on average on Core 2 Duo L7500@...GHz.when
tcpdump is running. With this patch it took 75.41 seconds, which means we save
80ns for each packet on that processor.

I think converting the VM to threaded code is low hanging fruit, even
if we'd have
JIT compilers for popular architectures. Most of the lines in my patch
are indentation
change, so the actual change is not big.

Vanilla kernel:

(without tcpdump)
ruiu@...e:~$ time ./udpflood 10000000
real	0m57.909s
user	0m1.368s
sys	0m56.484s

ruiu@...e:~$ time ./udpflood 10000000
real	0m57.686s
user	0m1.360s
sys	0m56.288s

ruiu@...e:~$ time ./udpflood 10000000
real	0m58.457s
user	0m1.300s
sys	0m57.116s

(with tcpdump)
ruiu@...e:~$ time ./udpflood 10000000
real	1m16.025s
user	0m1.464s
sys	1m14.505s
ruiu@...e:~$ time ./udpflood 10000000

real	1m15.860s
user	0m1.232s
sys	1m14.573s
ruiu@...e:~$ time ./udpflood 10000000

real	1m16.861s
user	0m1.504s
sys	1m15.301s


Kernel with the patch:

(without tcpdump)
ruiu@...e:~$ time ./udpflood 10000000

real	0m59.272s
user	0m1.308s
sys	0m57.924s

ruiu@...e:~$ time ./udpflood 10000000

real	0m59.624s
user	0m1.336s
sys	0m58.244s

ruiu@...e:~$ time ./udpflood 10000000

real	0m59.340s
user	0m1.240s
sys	0m58.056s

(with tcpdump)
ruiu@...e:~$ time ./udpflood 10000000

real	1m15.392s
user	0m1.372s
sys	1m13.965s

ruiu@...e:~$ time ./udpflood 10000000

real	1m15.352s
user	0m1.452s
sys	1m13.845s

ruiu@...e:~$ time ./udpflood 10000000

real	1m15.508s
user	0m1.464s
sys	1m13.989s

Tcpdump I used is this: tcpdump -p -n -s -i lo net 192.168.2.0/24

On Fri, Jul 29, 2011 at 2:30 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> Le vendredi 29 juillet 2011 à 01:10 -0700, Rui Ueyama a écrit :
>> Convert the BPF VM to threaded code to improve performance.
>>
>> The BPF VM is basically a big for loop containing a switch statement.  That is
>> slow because for each instruction it checks the for loop condition and does the
>> conditional branch of the switch statement.
>>
>> This patch eliminates the conditional branch, by replacing it with jump table
>> using GCC's labels-as-values feature. The for loop condition check can also be
>> removed, because the filter code always end with a RET instruction.
>>
>
> Well...
>
>
>> +#define NEXT goto *jump_table[(++fentry)->code]
>> +
>> +     /* Dispatch the first instruction */
>> +     goto *jump_table[fentry->code];
>
> This is the killer, as this cannot be predicted by the cpu.
>
> Do you have benchmark results to provide ?
>
> We now have BPF JIT on x86_64 and powerpc, and possibly on MIPS and ARM
> on a near future.
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists