[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55526F20.9020704@plumgrid.com>
Date: Tue, 12 May 2015 14:22:40 -0700
From: Alexei Starovoitov <ast@...mgrid.com>
To: Daniel Borkmann <daniel@...earbox.net>,
Pablo Neira Ayuso <pablo@...filter.org>,
Eric Dumazet <eric.dumazet@...il.com>
CC: netdev@...r.kernel.org, davem@...emloft.net, jhs@...atatu.com
Subject: Re: [PATCH 2/2 net-next] net: move qdisc ingress filtering code where
it belongs
On 5/12/15 6:27 AM, Daniel Borkmann wrote:
>
>> What's the i-cache size in your testbed?
>
> For the Xeon E3-1240, I get (via lscpu):
>
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 8192K
my E5-1630 v3 @ 3.70GHz:
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 10240K
I think it's not cpu that is causing discrepancies
between our numbers, but the difference in compilers or flags.
Looking at Pablo's perf profile:
36.12% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core
18.46% kpktgend_0 [kernel.kallsyms] [k] atomic_dec_and_test
15.87% kpktgend_0 [kernel.kallsyms] [k] deliver_ptype_list_skb
5.04% kpktgend_0 [pktgen] [k] pktgen_thread_worker
4.81% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal
4.11% kpktgend_0 [kernel.kallsyms] [k] kfree_skb
3.89% kpktgend_0 [kernel.kallsyms] [k] ip_rcv
It means that deliver_ptype_list_skb() is not inlined, which is odd
and atomic_dec_and_test() from kfree_skb() is also not inlined either.
Both functions are marked 'static inline'. So I suspect the kernel was
compiled with some broken gcc or CONFIG_CC_OPTIMIZE_FOR_SIZE is set.
If gcc is old/broken, it's really bad, since it can be mis-optimizing
bunch of other things.
If optimize_for_size is set, then it's not great for performance
either, since compiler will be trying way too hard to squeeze
code size and losing performance left and right.
btw, there is patch pending on lkml to make
atomic_dec_and_test() __always_inline.
-Os is also causing static_key to ignore 'unlikely', so all cold
branches are generated as fall through which causing I-cache misses.
I've looked at net/core/dev.s with -Os and it's not pretty.
bstats_update, deliver_skb, deliver_ptype_list_skb are all not inlined.
There was a thread on lkml recently to request better behaving -Os from
gcc guys, but I think it didn't go anywhere.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists