netdev - 4.9 conntrack performance issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a71d807acf63135bb037c7144fcd8d9@nuclearcat.com>
Date:   Sun, 15 Jan 2017 01:05:58 +0200
From:   Denys Fedoryshchenko <nuclearcat@...learcat.com>
To:     Guillaume Nault <g.nault@...halink.fr>,
        Netfilter Devel <netfilter-devel@...r.kernel.org>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: 4.9 conntrack performance issues

Hi!

Sorry if i added someone wrongly to CC, please let me know, if i should 
remove.
I just run successfully 4.9 on my nat several days ago, and seems panic 
issue disappeared. But i started to face another issue, it seems garbage 
collector is hogging one of CPU's.

Here is my data:
2xE5-2640 v3
396G ram
2x10G (bonding) with approx 14-15G load at peak time
It was handling load very well at 4.8 and below, it might be still fine, 
but i suspect queues that belong to hogged cpu might experience issues.
Is there anything can be done to improve cpu load distribution or reduce 
single core load?

net.netfilter.nf_conntrack_buckets = 65536
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 1236021
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_expect_max = 1024
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_helper = 0
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 6553600
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 0
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_established = 600
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 20
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 20
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 10
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 20
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 20
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 30
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.nf_conntrack_max = 6553600


it is non-peak values, as adjustments i have shorter than default 
timeouts. Changing net.netfilter.nf_conntrack_buckets to higher value 
doesn't fix issue.
I noticed that one of CPU's hogged (N24 in this case):

Linux 4.9.2-build-0127 (NAT)	01/14/17	_x86_64_	(32 CPU)

23:01:54     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal 
  %guest   %idle
23:02:04     all    0.09    0.00    1.60    0.01    0.00   28.28    0.00 
    0.00   70.01
23:02:04       0    0.11    0.00    0.00    0.00    0.00   32.38    0.00 
    0.00   67.51
23:02:04       1    0.12    0.00    0.12    0.00    0.00   29.91    0.00 
    0.00   69.86
23:02:04       2    0.23    0.00    0.11    0.00    0.00   29.57    0.00 
    0.00   70.09
23:02:04       3    0.11    0.00    0.11    0.11    0.00   28.80    0.00 
    0.00   70.86
23:02:04       4    0.23    0.00    0.11    0.11    0.00   31.41    0.00 
    0.00   68.14
23:02:04       5    0.11    0.00    0.00    0.00    0.00   29.28    0.00 
    0.00   70.61
23:02:04       6    0.11    0.00    0.11    0.00    0.00   31.81    0.00 
    0.00   67.96
23:02:04       7    0.11    0.00    0.11    0.00    0.00   32.69    0.00 
    0.00   67.08
23:02:04       8    0.00    0.00    0.23    0.00    0.00   42.12    0.00 
    0.00   57.64
23:02:04       9    0.11    0.00    0.00    0.00    0.00   30.86    0.00 
    0.00   69.02
23:02:04      10    0.11    0.00    0.11    0.00    0.00   30.93    0.00 
    0.00   68.84
23:02:04      11    0.00    0.00    0.11    0.00    0.00   32.73    0.00 
    0.00   67.16
23:02:04      12    0.11    0.00    0.11    0.00    0.00   29.85    0.00 
    0.00   69.92
23:02:04      13    0.00    0.00    0.00    0.00    0.00   30.96    0.00 
    0.00   69.04
23:02:04      14    0.00    0.00    0.00    0.00    0.00   30.09    0.00 
    0.00   69.91
23:02:04      15    0.00    0.00    0.11    0.00    0.00   30.63    0.00 
    0.00   69.26
23:02:04      16    0.11    0.00    0.00    0.00    0.00   25.88    0.00 
    0.00   74.01
23:02:04      17    0.11    0.00    0.00    0.00    0.00   22.82    0.00 
    0.00   77.07
23:02:04      18    0.11    0.00    0.00    0.00    0.00   23.75    0.00 
    0.00   76.14
23:02:04      19    0.11    0.00    0.11    0.00    0.00   24.86    0.00 
    0.00   74.92
23:02:04      20    0.11    0.00    0.11    0.11    0.00   24.48    0.00 
    0.00   75.19
23:02:04      21    0.22    0.00    0.11    0.00    0.00   23.43    0.00 
    0.00   76.24
23:02:04      22    0.11    0.00    0.11    0.00    0.00   25.46    0.00 
    0.00   74.32
23:02:04      23    0.00    0.00    0.11    0.00    0.00   25.47    0.00 
    0.00   74.41
23:02:04      24    0.00    0.00   45.06    0.00    0.00   42.18    0.00 
    0.00   12.76
23:02:04      25    0.11    0.00    0.11    0.11    0.00   25.22    0.00 
    0.00   74.46
23:02:04      26    0.11    0.00    0.00    0.11    0.00   23.39    0.00 
    0.00   76.39
23:02:04      27    0.22    0.00    0.11    0.00    0.00   23.83    0.00 
    0.00   75.85
23:02:04      28    0.11    0.00    0.11    0.00    0.00   24.10    0.00 
    0.00   75.68
23:02:04      29    0.11    0.00    0.11    0.00    0.00   23.80    0.00 
    0.00   75.98
23:02:04      30    0.11    0.00    0.11    0.00    0.00   23.45    0.00 
    0.00   76.33
23:02:04      31    0.11    0.00    0.11    0.00    0.00   20.37    0.00 
    0.00   79.42

And this is output of ./perf top -C 24 -e cycles

    PerfTop:     933 irqs/sec  kernel:100.0%  exact:  0.0% [1000Hz 
cycles],  (all, CPU: 24)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

     52.68%  [nf_conntrack]  [k] gc_worker
      3.88%  [ip_tables]     [k] ipt_do_table
      2.39%  [ixgbe]         [k] ixgbe_xmit_frame_ring
      2.29%  [kernel]        [k] _raw_spin_lock
      1.84%  [ixgbe]         [k] ixgbe_poll
      1.76%  [nf_conntrack]  [k] __nf_conntrack_find_get

perf report for this cpu (same, cycles)
# Children      Self  Command       Shared Object           Symbol
# ........  ........  ............  ......................  
....................................................
#
     88.98%     0.00%  kworker/24:1  [kernel.kallsyms]       [k] 
process_one_work
             |
             ---process_one_work
                |
                |--54.65%--gc_worker
                |          |
                |           --3.58%--nf_ct_gc_expired
                |                     |
                |                     |--1.90%--nf_ct_delete
                |                     |          |
                |                     |           
--1.31%--nf_ct_delete_from_lists
                |                     |
                |                      --1.61%--nf_conntrack_destroy
                |                                destroy_conntrack
                |                                |
                |                                 
--1.53%--nf_conntrack_free
                |                                           |
                |                                           
|--0.80%--kmem_cache_free
                |                                           |          |
                |                                           |           
--0.51%--__slab_free.isra.12
                |                                           |
                |                                            
--0.52%--__nf_ct_ext_destroy