lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 12 May 2015 14:22:40 -0700
From:	Alexei Starovoitov <ast@...mgrid.com>
To:	Daniel Borkmann <daniel@...earbox.net>,
	Pablo Neira Ayuso <pablo@...filter.org>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	netdev@...r.kernel.org, davem@...emloft.net, jhs@...atatu.com
Subject: Re: [PATCH 2/2 net-next] net: move qdisc ingress filtering code where
 it belongs

On 5/12/15 6:27 AM, Daniel Borkmann wrote:
>
>> What's the i-cache size in your testbed?
>
> For the Xeon E3-1240, I get (via lscpu):
>
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              8192K

my E5-1630 v3 @ 3.70GHz:
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              10240K

I think it's not cpu that is causing discrepancies
between our numbers, but the difference in compilers or flags.

Looking at Pablo's perf profile:
     36.12%  kpktgend_0  [kernel.kallsyms]  [k] __netif_receive_skb_core
     18.46%  kpktgend_0  [kernel.kallsyms]  [k] atomic_dec_and_test
     15.87%  kpktgend_0  [kernel.kallsyms]  [k] deliver_ptype_list_skb
      5.04%  kpktgend_0  [pktgen]           [k] pktgen_thread_worker
      4.81%  kpktgend_0  [kernel.kallsyms]  [k] netif_receive_skb_internal
      4.11%  kpktgend_0  [kernel.kallsyms]  [k] kfree_skb
      3.89%  kpktgend_0  [kernel.kallsyms]  [k] ip_rcv

It means that deliver_ptype_list_skb() is not inlined, which is odd
and atomic_dec_and_test() from kfree_skb() is also not inlined either.
Both functions are marked 'static inline'. So I suspect the kernel was
compiled with some broken gcc or CONFIG_CC_OPTIMIZE_FOR_SIZE is set.
If gcc is old/broken, it's really bad, since it can be mis-optimizing
bunch of other things.
If optimize_for_size is set, then it's not great for performance
either, since compiler will be trying way too hard to squeeze
code size and losing performance left and right.
btw, there is patch pending on lkml to make
atomic_dec_and_test() __always_inline.

-Os is also causing static_key to ignore 'unlikely', so all cold
branches are generated as fall through which causing I-cache misses.
I've looked at net/core/dev.s with -Os and it's not pretty.
bstats_update, deliver_skb, deliver_ptype_list_skb are all not inlined.

There was a thread on lkml recently to request better behaving -Os from
gcc guys, but I think it didn't go anywhere.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ