[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <ccf5c987-fe17-0465-0f4a-fdac984c25ab@gmail.com>
Date: Sat, 10 Nov 2018 20:34:34 +0100
From: Jean-Philippe Menil <jpmenil@...il.com>
To: steffen.klassert@...unet.com
Cc: herbert@...dor.apana.org.au, davem@...emloft.net,
netdev@...r.kernel.org, kuznet@....inr.ac.ru,
yoshfuji@...ux-ipv6.org
Subject: [BUG] xfrm: unable to handle kernel NULL pointer dereference
Hi guys,
we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using IPsec
VTI interfaces, on several vpn hosts, since upgrade from 4.4.
Attached, the offended oops against 4.18.
Output of decodedecode:
[ 37.134864] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00
00 8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe
<f6> 40 38 04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20
All code
========
0: 8b 44 24 70 mov 0x70(%rsp),%eax
4: 0f c8 bswap %eax
6: 89 87 b4 00 00 00 mov %eax,0xb4(%rdi)
c: 48 8b 86 20 05 00 00 mov 0x520(%rsi),%rax
13: 8b 80 f8 14 00 00 mov 0x14f8(%rax),%eax
19: 85 c0 test %eax,%eax
1b: 75 05 jne 0x22
1d: 48 85 d2 test %rdx,%rdx
20: 74 0e je 0x30
22: 48 8b 43 58 mov 0x58(%rbx),%rax
26: 48 83 e0 fe and $0xfffffffffffffffe,%rax
2a:* f6 40 38 04 testb $0x4,0x38(%rax) <--
trapping instruction
2e: 74 7d je 0xad
30: 44 89 b3 b4 00 00 00 mov %r14d,0xb4(%rbx)
37: 49 8b 44 24 20 mov 0x20(%r12),%rax
3c: 48 rex.W
3d: 39 .byte 0x39
3e: 86 20 xchg %ah,(%rax)
Code starting with the faulting instruction
===========================================
0: f6 40 38 04 testb $0x4,0x38(%rax)
4: 74 7d je 0x83
6: 44 89 b3 b4 00 00 00 mov %r14d,0xb4(%rbx)
d: 49 8b 44 24 20 mov 0x20(%r12),%rax
12: 48 rex.W
13: 39 .byte 0x39
14: 86 20 xchg %ah,(%rax)
if my understanding is correct, we fail here:
/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:
1169 return (!net->xfrm.policy_count[dir] && !skb->sp) ||
0x0000000000000b19 <+185>: testb $0x4,0x38(%rax)
0x0000000000000b1d <+189>: je 0xb9c <vti_rcv_cb+316>
(gdb) list *0x0000000000000b19
0xb19 is in vti_rcv_cb
(/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:1169).
1164 int ndir = dir | (reverse ? XFRM_POLICY_MASK + 1 : 0);
1165
1166 if (sk && sk->sk_policy[XFRM_POLICY_IN])
1167 return __xfrm_policy_check(sk, ndir, skb, family);
1168
1169 return (!net->xfrm.policy_count[dir] && !skb->sp) ||
1170 (skb_dst(skb)->flags & DST_NOPOLICY) ||
1171 __xfrm_policy_check(sk, ndir, skb, family);
1172 }
1173
I really have hard time to understand why skb seem to be freed twice.
I'm not able to repeat the bug in lab, but it happened regulary in prod,
seem to depend of the workload.
Any help will be appreciated.
Let me know if you need further informations.
Regards,
Jean-Philippe
View attachment "oops.txt" of type "text/plain" (14750 bytes)
Powered by blists - more mailing lists