lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20191212132132.GL8621@gauss3.secunet.de> Date: Thu, 12 Dec 2019 14:21:32 +0100 From: Steffen Klassert <steffen.klassert@...unet.com> To: Josh Hunt <johunt@...mai.com> CC: <herbert@...dor.apana.org.au>, David Miller <davem@...emloft.net>, netdev <netdev@...r.kernel.org> Subject: Re: crash in __xfrm_state_lookup on 4.19 LTS On Wed, Dec 11, 2019 at 02:52:41PM -0800, Josh Hunt wrote: > We've hit the following crash on a handful of machines recently running > 4.19.55 LTS and strongswan. The kernels running on these machines do have > some patches on top of 4.19 LTS, but nothing in the area of xfrm/ipsec: > > [54284.354997] general protection fault: 0000 [#1] SMP PTI > [54284.355504] CPU: 6 PID: 11937 Comm: charon Tainted: G O L > 4.19.55-4.19.2.4-amd64-2b86b5ea31726254 #1 > [54284.356382] Hardware name: Ciara Technologies 1x8-X6 SSD 32G > 10GE/CangJie, BIOS CC1F110D 08/12/2014 > [54284.357322] RIP: 0010:__xfrm_state_lookup+0x7f/0x110 > [54284.357856] Code: d0 4a 8d 04 c0 48 8b 00 48 85 c0 74 68 41 89 cf 49 89 > d6 41 89 f5 eb 09 48 8b 43 28 48 85 c0 74 54 48 83 e8 28 48 89 c3 74 4b <66> > 3b a8 d2 00 00 00 75 e5 44 3b 78 50 > 75 df 44 3a 60 54 75 d9 66 > [54284.359190] RSP: 0018:ffffab5043d93ad0 EFLAGS: 00010212 > [54284.359748] RAX: 6174735f79636e3d RBX: 6174735f79636e3d RCX: > 0000000064959bc7 > [54284.360219] RDX: ffff9bb0593c3380 RSI: 0000000000000000 RDI: > ffffffff951071c0 > [54284.360713] RBP: 0000000000000002 R08: 0000000000000010 R09: > 00000000001b950d > [54284.361209] R10: 000000000000003f R11: 0000000096001849 R12: > 0000000000000032 > [54284.361755] R13: 0000000000000000 R14: ffff9bb0593c3380 R15: > 0000000064959bc7 > [54284.362255] FS: 00007facd7b01700(0000) GS:ffff9bb07fb80000(0000) > knlGS:00000000000000000 > [54284.363198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [54284.363687] CR2: 00007f99250e89e0 CR3: 00000007e1078006 CR4: > 00000000001606e0 > [54284.364156] Call Trace: > [54284.364642] xfrm_state_add+0x108/0x290 > [54284.365113] xfrm_add_sa+0x9e6/0xb28 [xfrm_user] > [54284.365580] ? xfrm_user_rcv_msg+0x183/0x1a0 [xfrm_user] > [54284.366077] xfrm_user_rcv_msg+0x183/0x1a0 [xfrm_user] > [54284.366543] ? xfrm_dump_sa_done+0x30/0x30 [xfrm_user] > [54284.367040] netlink_rcv_skb+0xde/0x110 > [54284.367504] xfrm_netlink_rcv+0x30/0x40 [xfrm_user] > [54284.368000] netlink_unicast+0x191/0x230 > [54284.368463] netlink_sendmsg+0x2c4/0x390 > [54284.368958] sock_sendmsg+0x36/0x40 > [54284.369449] __sys_sendto+0xd8/0x150 > [54284.369940] ? kern_select+0xb9/0xe0 > [54284.370405] __x64_sys_sendto+0x24/0x30 > [54284.370946] do_syscall_64+0x4e/0x110 > [54284.383941] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [54284.384497] RIP: 0033:0x7face4679ad3 > > (gdb) list *(__xfrm_state_lookup+0x7f) > 0xffffffff8271beaf is in __xfrm_state_lookup (net/xfrm/xfrm_state.c:841). > warning: Source file is more recent than executable. > 836 { > 837 unsigned int h = xfrm_spi_hash(net, daddr, spi, proto, family); > 838 struct xfrm_state *x; > 839 > 840 hlist_for_each_entry_rcu(x, net->xfrm.state_byspi + h, byspi) { > 841 if (x->props.family != family || > 842 x->id.spi != spi || > 843 x->id.proto != proto || > 844 !xfrm_addr_equal(&x->id.daddr, daddr, family)) > 845 continue; > > The above looks similar to these very old reports: > https://wiki.strongswan.org/issues/2147 > https://bugzilla.kernel.org/show_bug.cgi?id=84961 > > Prior to the crash we are seeing softlockups and rcu stalls (see attached > netconsole log file.) The RIP in those stalls/lockups appears to be in the > same area as the crash reported above, lines 840 and 841. > > I've tried reproducing the problem in our lab, but have been unsuccessful so > far and running the latest upstream kernel in production to see if that > resolves the issue is not possible at the moment. It's very possible this > crash was happening on earlier kernel versions in our network, I just don't > have any data to confirm that. Do you have any possibility to reproduce this on v4.19.55? __xfrm_state_lookup() is called from process context and protected by rcu_read_lock(). But updates to the above list can happen in softirq context, so seems like we should disable BHs to prevent beeing interrupted by a softirq that updates the list.
Powered by blists - more mailing lists