lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <55416BAA.8010504@candelatech.com>
Date:	Wed, 29 Apr 2015 16:39:22 -0700
From:	Ben Greear <greearb@...delatech.com>
To:	netdev <netdev@...r.kernel.org>
Subject: Bad performance on modified pktgen in 4.0 vs 3.17 kernel.

We run a hacked version of pktgen, it has some pkt-rx logic, and probably spends more time
grabbing timestamps than stock code.  It also should not be doing any busy-spins for sleeping.

You can see pktgen changes, supporting patches, and various other stuff here:

http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-4.0.dev.y/.git;a=summary
git clone git://dmz2.candelatech.com/linux-4.0.dev.y


On a 64-bit atom system, with e1000 driver, we see around 50% cpu usage
when running 40,000 pkts per second on two interfaces on the 3.17.8+ kernel.

# cat perf-top-3-17.txt
   PerfTop:    3682 irqs/sec  kernel:78.7%  exact:  0.0% [4000Hz cycles],  (all, 4 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

     3.43%  [kernel]       [k] pktgen_thread_worker
     2.47%  libc-2.20.so   [.] __strstr_sse2
     2.31%  [kernel]       [k] e1000_xmit_frame
     2.25%  [kernel]       [k] number.isra.1
     2.18%  [kernel]       [k] vsnprintf
     1.96%  libc-2.20.so   [.] __GI___strcmp_ssse3
     1.84%  [kernel]       [k] format_decode
     1.80%  [kernel]       [k] build_skb
     1.79%  [kernel]       [k] kallsyms_expand_symbol.constprop.1
     1.76%  [kernel]       [k] native_read_tsc
     1.74%  perf           [.] rb_next
     1.57%  [kernel]       [k] getRelativeCurNs
     1.48%  perf           [.] symbols__insert
     1.10%  perf           [.] hex2u64
     1.07%  [kernel]       [k] e1000_irq_enable
     1.06%  [kernel]       [k] timekeeping_get_ns
     1.03%  [kernel]       [k] e1000_clean_rx_irq
     1.00%  [kernel]       [k] __getnstimeofday64
     0.97%  [kernel]       [k] string.isra.6
     0.97%  [kernel]       [k] do_raw_spin_lock
     0.97%  [kernel]       [k] kmem_cache_alloc
     0.94%  [kernel]       [k] e1000_intr_msi


On 4.0, there is significantly more CPU usage.  I tried copying the pktgen.c from 3.17 to 4.0
and that did not have any noticeable affect, so I think it must be something outside of my changes.

# cat perf-top-40.txt
   PerfTop:    4566 irqs/sec  kernel:87.4%  exact:  0.0% [4000Hz cycles],  (all, 4 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    20.72%  [kernel]       [k] mwait_idle_with_hints.constprop.2
    10.98%  [kernel]       [k] __lock_acquire
     3.30%  [kernel]       [k] pktgen_thread_worker
     2.41%  [kernel]       [k] arch_local_save_flags
     2.25%  [kernel]       [k] e1000_xmit_frame
     1.83%  [kernel]       [k] lock_release
     1.57%  [kernel]       [k] lock_acquire
     1.54%  [kernel]       [k] trace_hardirqs_on_caller
     1.50%  libc-2.20.so   [.] __strstr_sse2
     1.41%  [kernel]       [k] number.isra.1
     1.22%  [kernel]       [k] trace_hardirqs_off_caller
     1.20%  [kernel]       [k] kallsyms_expand_symbol.constprop.1
     1.19%  [kernel]       [k] build_skb
     1.18%  [kernel]       [k] format_decode
     1.17%  [kernel]       [k] hlock_class
     1.17%  [kernel]       [k] arch_local_irq_restore
     1.09%  [kernel]       [k] vsnprintf
     1.00%  [kernel]       [k] arch_local_irq_save
     0.97%  libc-2.20.so   [.] __GI___strcmp_ssse3
     0.97%  [kernel]       [k] mark_held_locks
     0.89%  [kernel]       [k] mark_lock


We see similar jump in CPU usage in the 4.0 kernel when using the 40G Intel NIC/driver
on an E5 system, so it is probably not just something to do with the driver.

Due to hooks in the pkt rx logic (and changes to the stock kernel code in that area between
3.17 and 4.), this will not be trivial to do an automated bisect, so I'm hoping to not
have to do that...

I'm curious if anyone has seen any similar performance degradation, and whether there
are any ideas what might be the problem.

Thanks,
Ben



-- 
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ