netdev - Re: [PATCH] net: filter: Convert the BPF VM to threaded code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACKH++afaAaa7a6ViYjo_PjpF1bXYtOuJaa-4umEOSVgW1+g3w@mail.gmail.com>
Date:	Fri, 29 Jul 2011 22:09:36 -0700
From:	Rui Ueyama <rui314@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: [PATCH] net: filter: Convert the BPF VM to threaded code

The result of benchmark looks good. A simple benchmark that sends 10M UDP
packets to lo took 76.24 seconds on average on Core 2 Duo L7500@...GHz.when
tcpdump is running. With this patch it took 75.41 seconds, which means we save
80ns for each packet on that processor.

I think converting the VM to threaded code is low hanging fruit, even
if we'd have
JIT compilers for popular architectures. Most of the lines in my patch
are indentation
change, so the actual change is not big.

Vanilla kernel:

(without tcpdump)
ruiu@...e:~$ time ./udpflood 10000000
real	0m57.909s
user	0m1.368s
sys	0m56.484s

ruiu@...e:~$ time ./udpflood 10000000
real	0m57.686s
user	0m1.360s
sys	0m56.288s

ruiu@...e:~$ time ./udpflood 10000000
real	0m58.457s
user	0m1.300s
sys	0m57.116s

(with tcpdump)
ruiu@...e:~$ time ./udpflood 10000000
real	1m16.025s
user	0m1.464s
sys	1m14.505s
ruiu@...e:~$ time ./udpflood 10000000

real	1m15.860s
user	0m1.232s
sys	1m14.573s
ruiu@...e:~$ time ./udpflood 10000000

real	1m16.861s
user	0m1.504s
sys	1m15.301s


Kernel with the patch:

(without tcpdump)
ruiu@...e:~$ time ./udpflood 10000000

real	0m59.272s
user	0m1.308s
sys	0m57.924s

ruiu@...e:~$ time ./udpflood 10000000

real	0m59.624s
user	0m1.336s
sys	0m58.244s

ruiu@...e:~$ time ./udpflood 10000000

real	0m59.340s
user	0m1.240s
sys	0m58.056s

(with tcpdump)
ruiu@...e:~$ time ./udpflood 10000000

real	1m15.392s
user	0m1.372s
sys	1m13.965s

ruiu@...e:~$ time ./udpflood 10000000

real	1m15.352s
user	0m1.452s
sys	1m13.845s

ruiu@...e:~$ time ./udpflood 10000000

real	1m15.508s
user	0m1.464s
sys	1m13.989s

Tcpdump I used is this: tcpdump -p -n -s -i lo net 192.168.2.0/24

On Fri, Jul 29, 2011 at 2:30 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> Le vendredi 29 juillet 2011 à 01:10 -0700, Rui Ueyama a écrit :
>> Convert the BPF VM to threaded code to improve performance.
>>
>> The BPF VM is basically a big for loop containing a switch statement.  That is
>> slow because for each instruction it checks the for loop condition and does the
>> conditional branch of the switch statement.
>>
>> This patch eliminates the conditional branch, by replacing it with jump table
>> using GCC's labels-as-values feature. The for loop condition check can also be
>> removed, because the filter code always end with a RET instruction.
>>
>
> Well...
>
>
>> +#define NEXT goto *jump_table[(++fentry)->code]
>> +
>> +     /* Dispatch the first instruction */
>> +     goto *jump_table[fentry->code];
>
> This is the killer, as this cannot be predicted by the cpu.
>
> Do you have benchmark results to provide ?
>
> We now have BPF JIT on x86_64 and powerpc, and possibly on MIPS and ARM
> on a near future.
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html