[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <555273FB.6040800@iogearbox.net>
Date: Tue, 12 May 2015 23:43:23 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Alexei Starovoitov <ast@...mgrid.com>,
Pablo Neira Ayuso <pablo@...filter.org>,
Eric Dumazet <eric.dumazet@...il.com>
CC: netdev@...r.kernel.org, davem@...emloft.net, jhs@...atatu.com
Subject: Re: [PATCH 2/2 net-next] net: move qdisc ingress filtering code where
it belongs
On 05/12/2015 11:22 PM, Alexei Starovoitov wrote:
> On 5/12/15 6:27 AM, Daniel Borkmann wrote:
>>
>>> What's the i-cache size in your testbed?
>>
>> For the Xeon E3-1240, I get (via lscpu):
>>
>> L1d cache: 32K
>> L1i cache: 32K
>> L2 cache: 256K
>> L3 cache: 8192K
>
> my E5-1630 v3 @ 3.70GHz:
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 10240K
>
> I think it's not cpu that is causing discrepancies
> between our numbers, but the difference in compilers or flags.
>
> Looking at Pablo's perf profile:
> 36.12% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core
> 18.46% kpktgend_0 [kernel.kallsyms] [k] atomic_dec_and_test
> 15.87% kpktgend_0 [kernel.kallsyms] [k] deliver_ptype_list_skb
> 5.04% kpktgend_0 [pktgen] [k] pktgen_thread_worker
> 4.81% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal
> 4.11% kpktgend_0 [kernel.kallsyms] [k] kfree_skb
> 3.89% kpktgend_0 [kernel.kallsyms] [k] ip_rcv
>
> It means that deliver_ptype_list_skb() is not inlined, which is odd
> and atomic_dec_and_test() from kfree_skb() is also not inlined either.
> Both functions are marked 'static inline'. So I suspect the kernel was
> compiled with some broken gcc or CONFIG_CC_OPTIMIZE_FOR_SIZE is set.
> If gcc is old/broken, it's really bad, since it can be mis-optimizing
> bunch of other things.
There was a recent lkml thread from Hagen wrt bad inlining heuristics
of gcc:
https://lkml.org/lkml/2015/4/20/637
https://lkml.org/lkml/2015/4/23/598
"Here is the situation: the inlining problem occur with the 4.9.x
branch - I tried to reproduce it with 4.8.x and saw *no* problems."
[ I was using: gcc (GCC) 4.8.3 20140624 (Red Hat 4.8.3-1) ]
> If optimize_for_size is set, then it's not great for performance
> either, since compiler will be trying way too hard to squeeze
> code size and losing performance left and right.
> btw, there is patch pending on lkml to make
> atomic_dec_and_test() __always_inline.
>
> -Os is also causing static_key to ignore 'unlikely', so all cold
> branches are generated as fall through which causing I-cache misses.
> I've looked at net/core/dev.s with -Os and it's not pretty.
> bstats_update, deliver_skb, deliver_ptype_list_skb are all not inlined.
>
> There was a thread on lkml recently to request better behaving -Os from
> gcc guys, but I think it didn't go anywhere.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists