netdev - softlockups when trying to restore an nft set of 1M entries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <54DDE717.6090703@akamai.com>
Date:	Fri, 13 Feb 2015 05:59:19 -0600
From:	Josh Hunt <johunt@...mai.com>
To:	Thomas Graf <tgraf@...g.ch>,
	Pablo Neira Ayuso <pablo@...filter.org>,
	Patrick McHardy <kaber@...sh.net>
CC:	netfilter-devel@...r.kernel.org, netdev@...r.kernel.org
Subject: softlockups when trying to restore an nft set of 1M entries

In my testing of nftables sets for our netdev bof discussion I came 
across this problem where if I try and do a set restore of 1M entries 
the machine gets into a softlockup state. Once this is triggered the 
system has to be rebooted.

I can trigger the case by generating a simple nft rules file which 
defines a set of type ipv4_addr. Something like this:

flush ruleset
table ip filter {
         set blackhole {
                 type ipv4_addr
         }
         chain input {
                  type filter hook input priority 0;
         }

         chain forward {
                  type filter hook forward priority 0;
         }

         chain output {
                  type filter hook output priority 0;
         }
}

except inside the set definition above I add 1M random ipv4 addresses. 
Running "nft -f <filename>" will reproduce the problem. I also saw this 
when trying to do a restore of 250k entries.

There are a few problems going on from what I can tell. The first is
the set defaults to 4 buckets and during restores the # of buckets does 
not increase. I'm currently investigating to understand why we don't 
expand the set on restores. However my guess into why we're 
softlockuping here is that we're trying to shove 1M entries into 4 
buckets :)

Second, the user has no way to tune the # of initial buckets. My 
patchset "nft hash set expansion fixes" fixes this. If I tune the hash 
to use a reasonable # of buckets for 1M entries. I do not see the 
softlockup problem.

I ran these tests using the current net-next.

Here's some of the softlockup output. Let me know if you'd like more 
info, etc.

[  328.092675] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! 
[nft:3921]
[  328.100185] Modules linked in: nft_hash nft_rbtree nf_tables_ipv4 
nf_tables nfnetlink iptable_filter ip_tables x_tables dm_crypt 
ipmi_devintf ipmi_msghandler i2c_dev ipv6 coretemp hwmon bnx2x ptp 
pps_core i2c_i801 lpc_ich i2c_core mfd_core crc32c_generic crc32c_intel 
ie31200_edac libcrc32c edac_core mdio ext4 jbd2 crc16 raid10 raid456 
async_raid6_recov async_pq rai�6_pq async_xor xor async_memcpy async_tx 
raid1 raid0 linear md_mod dm_mod ahci libahci libata mpt2sas 
scsi_transport_sas raid_class
[  328.151902] CPU: 4 PID: 3921 Comm: nft Not tainted 3.19.0-rc7+ #28
[  328.158542] Hardware name: CIARA TECHNOLOGIES 1X8-X6 SSD 16G 
10GE/S5530WG2NR-LE-2T-AKA, BIOS 7.008 14/04/2014
[  328.169289] task: ffff880407266210 ti: ffff880400ff0000 task.ti: 
ffff880400ff0000
[  328.177609] RIP: 0010:[<ffffffff8134dd41>]  [<ffffffff8134dd41>] 
memcmp+0x11/0x50
[  328.186043] RSP: 0018:ffff880400ff38d8  EFLAGS: 00000202
[  328.191811] RAX: 00000000000000f4 RBX: ffff88040f000340 RCX: 
00000000000000e3
[  328.199407] RDX: 0000000000000004 RSI: ffff880400ff39f0 RDI: 
ffff8803f37ce7e8
[  328.207000] RBP: ffff880400ff38d8 R08: 00000000000000d9 R09: 
00000000ffffffdf
[  328.214593] R10: 0000000000000015 R11: dead000000100100 R12: 
000412d000000010
[  328.222189] R13: 00000040�000000b R14: ffffffff000492d0 R15: 
ffff880400ff3928
[  328.229781] FS:  00007f7ddf1d6700(0000) GS:ffff88041fd00000(0000) 
knlGS:0000000000000000
[  328.238709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  328.244909] CR2: 00007f3b0d890000 CR3: 000000040ae41000 CR4: 
00000000001407e0
[  328.252505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  328.260100] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  328.267692] Stack:
[  328.270171]  ffff880400ff3908 ffffffffa056160a ffff880400ff38f8 
ffff8800379b2290
[  328.278805]  ffffffffa05615d0 ffff880400ff3968 ffff880400ff3958 
ffffffff8135a25d
[  328.287437]  ffff88040c86a300 0495cff0a054a125 0000000000000000 
ffff8800379b2200
[  328.296070] Call Trace:
[  328.298983]  [<ffffffffa056160a>] nft_hash_compare+0x3a/0x88 [nft_hash]
[  328.306054]  [<ffffffffa05615d0>] ? nft_hash_lookup+0x60/0x60 [nft_hash]
[  328.313218]  [<ffffffff8135a25d>] rhashtable_lookup_compare+0x6d/0xb0
[  328.320118]  [<ffffffffa0561560>] nft_has�_get+0x30/0x40 [nft_hash]
[  328.326846]  [<ffffffffa054a4d4>] nft_add_set_elem+0x164/0x3b0 
[nf_tables]
[  328.334180]  [<ffffffffa0546fdc>] ? nft_trans_set_add+0x2c/0xa0 
[nf_tables]
[  328.341602]  [<ffffffffa0561000>] ? 0xffffffffa0561000
[  328.347205]  [<ffffffffa054d85f>] ? nf_tables_newset+0x7df/0x8d0 
[nf_tables]
[  328.354711]  [<ffffffff8136ca52>] ? nla_strcmp+0x42/0x50
[  328.360489]  [<ffffffffa0546b14>] ? nf_tables_table_lookup+0x44/0x80 
[nf_tables]
[  328.368723]  [<ffffffffa054da1e>] nf_tables_newsetelem+0xce/0x170 
[nf_tables]
[  328.376316]  [<ffffffffa054093c>] nfnetlink_rcv_batch+0x33c/0x430 
[nfnetlink]
[  328.383913]  [<ffffffffa05406ed>] ? nfnetlink_rcv_batch+0xed/0x430 
[nfnetlink]
[  328.391974]  [<ffffffffa0540abf>] nfnetlink_rcv+0x8f/0xc8 [nfnetlink]
[  328.398876]  [<ffffffff81568a92>] netlink_unicast+0x182/0x210
[  328.405082]  [<ffffffff81568f58>] netlink_sendmsg+0x378/0x3e0
[  328.411295]  [<ffffffff8151ec2f>] do_sock_sendmsg+0x8f/0xa0
[  328.417327]  [<ffffffff8151ec50>] sock_sendmsg+0x10/0x20
[  328.423097]  [<ffffffff81521655>] ___sys_sendmsg+0x315/0x330
[  328.429216]  [<ffffffff810daacc>] ? acct_account_cputime+0x1c/0x20
[  328.435859]  [<ffffffff81078f5d>] ? account_system_time+0x9d/0x190
[  328.442502]  [<ffffffff81078a55>] ? local_clock+0x25/0x30
[  328.448364]  [<ffffffff8109faf8>] ? rcu_eqs_enter+0x68/0x90
[  328.454399]  [<ffffffff810daacc>] ? acct_account_cputime+0x1c/0x20
[  328.461042]  [<ffffffff81078eb1>] ? account_user_time+0x91/0xa0
[  328.467423]  [<ffffffff81522469>] __sys_sendmsg+0x49/0x90
[  328.473287]  [<ffffffff81616dfd>] ? int_check_syscall_exit_work+0x34/0x3d
[  328.480534]  [<ffffffff815224c9>] SyS_sendmsg+0x19/0x20
[  328.486223]  [<ffffffff81616bd2>] system_call_fastpath+0x12/0x17
[  328.492690] Code: c3 66 0f 1f 84 00 00 00 00 00 31 c0 c6 06 00 5d c3 
66 0f 1f 84 00 00 00 00 00 55 31 c0 48 85 d2 48 89 e5 74 2f 0f b6 07 0f 
b6 0e <29> c8 75 25 48 83 ea 01 31 c9 eb 18 0f 1f 00 44 0f b6 4c 0f 01
[  331.718616] INFO: rcu_sched self-detected stall on CPU[  331.720614] 
INFO: rcu_sched detected stalls on CPUs/tasks: { 4} (detected by 0, 
t=30002 jiffies, g=6997, c=6996, q=0)
[  331.720617] Task dump for CPU 4:
[  331.720618] nft             R  running task        0  3921   3876 
0x00080008
[  331.720620]  ffff88041fffad80 000000000001a5e8 000000000000003e 
000000000000003f
[  331.720621]  0000000000000000 ffff8803f41ac000 ffff88040f000340 
0000000000000000
[  331.720622]  0000000000000000 ffff88040f0012c0 ffff88040f000340 
ffff880400ff3818
[  331.720623] Call Trace:
[  331.720625]  [<ffffffff8116d593>] ? kmem_getpages+0xb3/0x110
[  331.720629]  [<ffffffff8116ec26>] ? cache_grow+0x146/0x210
[  331.720630]  [<ffffffff8134dd3e>] ? memcmp+0xe/0x50
[  331.720634]  [<ffffffff8136ccf0>] ? nla_parse+0x90/0x110
[  331.720636]  [<ffffffffa056160a>] ? nft_hash_compare+0x3a/0x88 [nft_hash]
[  331.720638]  [<ffffffffa05615d0>] ? nft_hash_lookup+0x60/0x60 [nft_hash]
[  331.720639]  [<ffffffff8135a25d>] ? rhashtable_lookup_compare+0x6d/0xb0
[  331.720641]  [<ffffffffa0�61560>] ? nf�_hash_get+0x30/0x40 [nft_hash]
[  331.720642]  [<ffffffffa054a4d4>] ? nft_add_set_elem+0x164/0x3b0 
[nf_tables]
[  331.720645]  [<ffffffffa0546fdc>] ? nft_trans_set_add+0x2c/0xa0 
[nf_tables]
[  331.720647]  [<ffffffffa0561000>] ? 0xffffffffa0561000
[  331.720654]  [<ffffffffa054d85f>] ? nf_tables_newset+0x7df/0x8d0 
[nf_tables]
[  331.720656]  [<ffffffff8136ca52>] ? nla_strcmp+0x42/0x50
[  331.720657]  [<ffffffffa0546b14>] ? nf_tables_table_lookup+0x44/0x80 
[nf_tables]
[  331.720659]  [<ffffffffa054da1e>] ? nf_tables_newsetelem+0xce/0x170 
[nf_tables]
[  331.720661]  [<ffffffffa054093c>] ? nfnetlink_rcv_atch+0x33c/0x430 
[nfnetlink]
[  331.720663]  [<ffffffffa05406ed>] ? nfnetlink_rcv_batch+0xed/0x430 
[nfnetlink]
[  331.720664]  [<ffffffffa0540abf>] ? nfnetlink_rcv+0x8f/0xc8 [nfnetlink]
[  331.720665]  [<ffffffff81568a92>] ? netlink_unicast+0x182/0x210
[  331.720668]  [<ffffffff81568f58>] ? netlink_sendmsg+0x378/0x3e0
[  331.720670]  [<ffffffff8151ec2f>] ? do_sock_sendmsg+0x8f/0xa0
[  331.720672]  [<ffffffff8151ec50>] ? sock_sendmsg+0x10/0x20
[  331.720673]  [<ffffffff81521655>] ? ___sys_sendmsg+0x315/0x330
[  331.720675]  [<ffffffff810daacc>] ? acct_account_cputime+0x1c/0x20
[  331.720677]  [<ffffffff81078f5d>] ? account_system_time+0x9d/0x190
[  331.720679]  [<ffffffff81078a55>] ? local_clock+0x25/0x30
[  331.720680]  [<ffffffff8109faf8>] ? rcu_eqs_enter+0x68/0x90
[  331.720683]  [<ffffffff810daacc>] ? acct_account_cputime+0x1c/0x20
[  331.720684]  [<ffffffff81078eb1>] ? account_user_time+0x91/0xa0
[  331.720685]  [<ffffffff81522469>] ? __sys_sendmsg+0x49/0x90
[  331.720687]  [<ffffffff81616dfd>] ? int_check_syscall_exit_work+0x34/0x3d
[  331.720690]  [<ffffffff815224c9>] ? SyS_sendmsg+0x19/0x20
[  331.720691]  [<ffffffff81616bd2>] ? system_call_fastpath+0x12/0x17

Thanks
Josh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html