lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 29 Jul 2011 22:09:36 -0700 From: Rui Ueyama <rui314@...il.com> To: Eric Dumazet <eric.dumazet@...il.com> Cc: netdev@...r.kernel.org Subject: Re: [PATCH] net: filter: Convert the BPF VM to threaded code The result of benchmark looks good. A simple benchmark that sends 10M UDP packets to lo took 76.24 seconds on average on Core 2 Duo L7500@...GHz.when tcpdump is running. With this patch it took 75.41 seconds, which means we save 80ns for each packet on that processor. I think converting the VM to threaded code is low hanging fruit, even if we'd have JIT compilers for popular architectures. Most of the lines in my patch are indentation change, so the actual change is not big. Vanilla kernel: (without tcpdump) ruiu@...e:~$ time ./udpflood 10000000 real 0m57.909s user 0m1.368s sys 0m56.484s ruiu@...e:~$ time ./udpflood 10000000 real 0m57.686s user 0m1.360s sys 0m56.288s ruiu@...e:~$ time ./udpflood 10000000 real 0m58.457s user 0m1.300s sys 0m57.116s (with tcpdump) ruiu@...e:~$ time ./udpflood 10000000 real 1m16.025s user 0m1.464s sys 1m14.505s ruiu@...e:~$ time ./udpflood 10000000 real 1m15.860s user 0m1.232s sys 1m14.573s ruiu@...e:~$ time ./udpflood 10000000 real 1m16.861s user 0m1.504s sys 1m15.301s Kernel with the patch: (without tcpdump) ruiu@...e:~$ time ./udpflood 10000000 real 0m59.272s user 0m1.308s sys 0m57.924s ruiu@...e:~$ time ./udpflood 10000000 real 0m59.624s user 0m1.336s sys 0m58.244s ruiu@...e:~$ time ./udpflood 10000000 real 0m59.340s user 0m1.240s sys 0m58.056s (with tcpdump) ruiu@...e:~$ time ./udpflood 10000000 real 1m15.392s user 0m1.372s sys 1m13.965s ruiu@...e:~$ time ./udpflood 10000000 real 1m15.352s user 0m1.452s sys 1m13.845s ruiu@...e:~$ time ./udpflood 10000000 real 1m15.508s user 0m1.464s sys 1m13.989s Tcpdump I used is this: tcpdump -p -n -s -i lo net 192.168.2.0/24 On Fri, Jul 29, 2011 at 2:30 AM, Eric Dumazet <eric.dumazet@...il.com> wrote: > Le vendredi 29 juillet 2011 à 01:10 -0700, Rui Ueyama a écrit : >> Convert the BPF VM to threaded code to improve performance. >> >> The BPF VM is basically a big for loop containing a switch statement. That is >> slow because for each instruction it checks the for loop condition and does the >> conditional branch of the switch statement. >> >> This patch eliminates the conditional branch, by replacing it with jump table >> using GCC's labels-as-values feature. The for loop condition check can also be >> removed, because the filter code always end with a RET instruction. >> > > Well... > > >> +#define NEXT goto *jump_table[(++fentry)->code] >> + >> + /* Dispatch the first instruction */ >> + goto *jump_table[fentry->code]; > > This is the killer, as this cannot be predicted by the cpu. > > Do you have benchmark results to provide ? > > We now have BPF JIT on x86_64 and powerpc, and possibly on MIPS and ARM > on a near future. > > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists