lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <ccf5c987-fe17-0465-0f4a-fdac984c25ab@gmail.com>
Date:   Sat, 10 Nov 2018 20:34:34 +0100
From:   Jean-Philippe Menil <jpmenil@...il.com>
To:     steffen.klassert@...unet.com
Cc:     herbert@...dor.apana.org.au, davem@...emloft.net,
        netdev@...r.kernel.org, kuznet@....inr.ac.ru,
        yoshfuji@...ux-ipv6.org
Subject: [BUG] xfrm: unable to handle kernel NULL pointer dereference

Hi guys,

we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using IPsec 
VTI interfaces, on several vpn hosts, since upgrade from 4.4.

Attached, the offended oops against 4.18.

Output of decodedecode:

[ 37.134864] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 
00 8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe 
<f6> 40 38 04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20
All code
========
    0:   8b 44 24 70             mov    0x70(%rsp),%eax
    4:   0f c8                   bswap  %eax
    6:   89 87 b4 00 00 00       mov    %eax,0xb4(%rdi)
    c:   48 8b 86 20 05 00 00    mov    0x520(%rsi),%rax
   13:   8b 80 f8 14 00 00       mov    0x14f8(%rax),%eax
   19:   85 c0                   test   %eax,%eax
   1b:   75 05                   jne    0x22
   1d:   48 85 d2                test   %rdx,%rdx
   20:   74 0e                   je     0x30
   22:   48 8b 43 58             mov    0x58(%rbx),%rax
   26:   48 83 e0 fe             and    $0xfffffffffffffffe,%rax
   2a:*  f6 40 38 04             testb  $0x4,0x38(%rax)          <-- 
trapping instruction
   2e:   74 7d                   je     0xad
   30:   44 89 b3 b4 00 00 00    mov    %r14d,0xb4(%rbx)
   37:   49 8b 44 24 20          mov    0x20(%r12),%rax
   3c:   48                      rex.W
   3d:   39                      .byte 0x39
   3e:   86 20                   xchg   %ah,(%rax)

Code starting with the faulting instruction
===========================================
    0:   f6 40 38 04             testb  $0x4,0x38(%rax)
    4:   74 7d                   je     0x83
    6:   44 89 b3 b4 00 00 00    mov    %r14d,0xb4(%rbx)
    d:   49 8b 44 24 20          mov    0x20(%r12),%rax
   12:   48                      rex.W
   13:   39                      .byte 0x39
   14:   86 20                   xchg   %ah,(%rax)


if my understanding is correct, we fail here:

/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:
1169            return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
    0x0000000000000b19 <+185>:   testb  $0x4,0x38(%rax)
    0x0000000000000b1d <+189>:   je     0xb9c <vti_rcv_cb+316>

(gdb) list *0x0000000000000b19
0xb19 is in vti_rcv_cb 
(/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:1169).
1164            int ndir = dir | (reverse ? XFRM_POLICY_MASK + 1 : 0);
1165
1166            if (sk && sk->sk_policy[XFRM_POLICY_IN])
1167                    return __xfrm_policy_check(sk, ndir, skb, family);
1168
1169            return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
1170                    (skb_dst(skb)->flags & DST_NOPOLICY) ||
1171                    __xfrm_policy_check(sk, ndir, skb, family);
1172    }
1173

I really have hard time to understand why skb seem to be freed twice.

I'm not able to repeat the bug in lab, but it happened regulary in prod, 
seem to depend of the workload.

Any help will be appreciated.

Let me know if you need further informations.

Regards,

Jean-Philippe

View attachment "oops.txt" of type "text/plain" (14750 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ