lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 12 May 2015 23:43:23 +0200
From:	Daniel Borkmann <daniel@...earbox.net>
To:	Alexei Starovoitov <ast@...mgrid.com>,
	Pablo Neira Ayuso <pablo@...filter.org>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	netdev@...r.kernel.org, davem@...emloft.net, jhs@...atatu.com
Subject: Re: [PATCH 2/2 net-next] net: move qdisc ingress filtering code where
 it belongs

On 05/12/2015 11:22 PM, Alexei Starovoitov wrote:
> On 5/12/15 6:27 AM, Daniel Borkmann wrote:
>>
>>> What's the i-cache size in your testbed?
>>
>> For the Xeon E3-1240, I get (via lscpu):
>>
>> L1d cache:             32K
>> L1i cache:             32K
>> L2 cache:              256K
>> L3 cache:              8192K
>
> my E5-1630 v3 @ 3.70GHz:
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              10240K
>
> I think it's not cpu that is causing discrepancies
> between our numbers, but the difference in compilers or flags.
>
> Looking at Pablo's perf profile:
>      36.12%  kpktgend_0  [kernel.kallsyms]  [k] __netif_receive_skb_core
>      18.46%  kpktgend_0  [kernel.kallsyms]  [k] atomic_dec_and_test
>      15.87%  kpktgend_0  [kernel.kallsyms]  [k] deliver_ptype_list_skb
>       5.04%  kpktgend_0  [pktgen]           [k] pktgen_thread_worker
>       4.81%  kpktgend_0  [kernel.kallsyms]  [k] netif_receive_skb_internal
>       4.11%  kpktgend_0  [kernel.kallsyms]  [k] kfree_skb
>       3.89%  kpktgend_0  [kernel.kallsyms]  [k] ip_rcv
>
> It means that deliver_ptype_list_skb() is not inlined, which is odd
> and atomic_dec_and_test() from kfree_skb() is also not inlined either.
> Both functions are marked 'static inline'. So I suspect the kernel was
> compiled with some broken gcc or CONFIG_CC_OPTIMIZE_FOR_SIZE is set.
> If gcc is old/broken, it's really bad, since it can be mis-optimizing
> bunch of other things.

There was a recent lkml thread from Hagen wrt bad inlining heuristics
of gcc:

   https://lkml.org/lkml/2015/4/20/637
   https://lkml.org/lkml/2015/4/23/598

"Here is the situation: the inlining problem occur with the 4.9.x
  branch - I tried to reproduce it with 4.8.x and saw *no* problems."

[ I was using: gcc (GCC) 4.8.3 20140624 (Red Hat 4.8.3-1) ]

> If optimize_for_size is set, then it's not great for performance
> either, since compiler will be trying way too hard to squeeze
> code size and losing performance left and right.
> btw, there is patch pending on lkml to make
> atomic_dec_and_test() __always_inline.
>
> -Os is also causing static_key to ignore 'unlikely', so all cold
> branches are generated as fall through which causing I-cache misses.
> I've looked at net/core/dev.s with -Os and it's not pretty.
> bstats_update, deliver_skb, deliver_ptype_list_skb are all not inlined.
>
> There was a thread on lkml recently to request better behaving -Os from
> gcc guys, but I think it didn't go anywhere.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ