lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEA6p_B7JYUhV+QB+7EWcs74oUz8VYDQgtmKiNHef5vsEBi7Lw@mail.gmail.com>
Date:   Wed, 5 Sep 2018 10:09:57 -0700
From:   Wei Wang <weiwan@...gle.com>
To:     songliubraving@...com
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        David Ahern <dsahern@...il.com>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: BUG: unable to handle kernel paging request in fib6_node_lookup_1

On Tue, Sep 4, 2018 at 11:11 PM Song Liu <songliubraving@...com> wrote:
>
> We are debugging an issue with fib6_node_lookup_1().
>
> We use a 4.16 based kernel, and we have back ported most upstream
> patches in ip6_fib.{c.h}. The only major differences I can spot are
>
> 8b7f2731bd68d83940714ce92381d1a72596407c
> c3506372277779fccbffee2475400fcd689d5738
>
> I guess the issue is not related to these two fixes.
>
> After staring at the call trace and disassembly code (attached below)
> I guess this is a use-after-free issue in (or right after) the lookup
> loop:
>
>         for (;;) {
>                 struct fib6_node *next;
>
>                 dir = addr_bit_set(args->addr, fn->fn_bit);
>
>                 next = dir ? rcu_dereference(fn->right) :
>                              rcu_dereference(fn->left);
>
>                 if (next) {
>                         fn = next;
>                         continue;
>                 }
>                 break;
>         }
>
> I guess this probably also happens to latest upstream. I haven't
> tested this with upstream kernel (or net tree) yet, because we
> can only trigger this about once a week on 100 servers.
>
> Does this look familiar? Any comments and/or suggestions are highly
> appreciated.
>
By glancing at the commit logs, I don't think any changes were made
regarding the core logic of fib6_node handling recently.
(There were a couple of fixes regarding fib6_info but I don't think it
is the cause here... But it is still good to check if you have commit
9b0a8da8c4c6, e873e4b9cc7e, e70a3aad44cc in your build.)

I also went through the call path and did not find anything obviously wrong...
I think it's the best for you to reproduce it and we can debug further.
One question is, do you have "CONFIG_IPV6_SUBTREE" enabled and specify
src IP in the routing table?

Thanks.
Wei

> Thanks,
> Song
>
>
> Bug stack trace:
>
> [354764.457916] BUG: unable to handle kernel
> [354764.466125] paging request
> [354764.471720]  at 00000000f60fc318
> [354764.478360] IP: fib6_node_lookup_1+0x29/0x130
> [354764.487249] PGD 800000010f725067
> [354764.494062] P4D 800000010f725067
> [354764.500878] PUD 0
> [354764.505087] Oops: 0000 [#1] SMP PTI
> [354764.512245] Modules linked in:
> [354764.518536]  udp_diag
> [354764.523266]  act_gact
> [354764.527997]  cls_bpf
> [354764.532557]  tcp_diag
> [354764.537291]  inet_diag
> [354764.542200]  nfsv3
> [354764.546409]  nfs
> [354764.550273]  fscache
> [354764.554834]  ip6table_raw
> [354764.560260]  ip6table_filter
> [354764.566208]  xt_DSCP
> [354764.570765]  iptable_raw
> [354764.576020]  iptable_filter
> [354764.581790]  ip6table_mangle
> [354764.587738]  iptable_mangle
> [354764.593505]  sb_edac
> [354764.598058]  x86_pkg_temp_thermal
> [354764.604872]  intel_powerclamp
> [354764.610992]  coretemp
> [354764.615723]  kvm_intel
> [354764.620628]  kvm
> [354764.624494]  irqbypass
> [354764.629399]  iTCO_wdt
> [354764.634132]  iTCO_vendor_support
> [354764.640772]  i2c_i801
> [354764.645507]  lpc_ich
> [354764.650064]  efivars
> [354764.654619]  mfd_core
> [354764.659353]  ipmi_si
> [354764.663911]  ipmi_devintf
> [354764.669341]  ipmi_msghandler
> [354764.675281]  acpi_cpufreq
> [354764.680711]  button
> [354764.685096]  sch_fq_codel
> [354764.690520]  nfsd
> [354764.694557]  nfs_acl
> [354764.699118]  lockd
> [354764.703330]  auth_rpcgss
> [354764.708588]  oid_registry
> [354764.714006]  grace
> [354764.718213]  sunrpc
> [354764.722590]  fuse
> [354764.726626]  loop
> [354764.730661]  efivarfs
> [354764.735395]  autofs4
> [354764.739957] CPU: 5 PID: 3460038 Comm: java Not tainted 4.16.0-14_fbk2_1455_g6bcb99c57db6 #14
> [354764.756996] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM03   06/02/2016
> [354764.773001] RIP: 0010:fib6_node_lookup_1+0x29/0x130
> [354764.782929] RSP: 0018:ffffc9003f0bb730 EFLAGS: 00010206
> [354764.793557] RAX: ffff883fc131a000 RBX: 00000000f60fc300 RCX: 00000000ffffffe4
> [354764.807999] RDX: 0000000000000010 RSI: 0000000000000001 RDI: ffffc9003f0bb8f0
> [354764.822436] RBP: ffffc9003f0bb750 R08: 0000000000000002 R09: 0000000000000004
> [354764.836877] R10: ffffc9003f0bb7a8 R11: ffff883ff7795780 R12: ffffffff82305080
> [354764.851317] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
> [354764.865765] FS:  00007f8defcfc700(0000) GS:ffff881fff940000(0000) knlGS:0000000000000000
> [354764.882119] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [354764.893800] CR2: 00000000f60fc318 CR3: 0000000f68cae006 CR4: 00000000003606e0
> [354764.908235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [354764.922671] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [354764.937109] Call Trace:
> [354764.942195]  fib6_node_lookup+0x67/0x90
> [354764.950042]  ? fib6_table_lookup+0x43/0x2f0
> [354764.958587]  fib6_table_lookup+0x43/0x2f0
> [354764.966794]  ip6_pol_route+0x43/0x360
> [354764.974294]  ? ip6_pol_route_input+0x20/0x20
> [354764.983016]  fib6_rule_lookup+0x85/0x140
> [354764.991050]  ? ip6t_do_table+0x331/0x6b0
> [354764.999089]  ? ip6_route_output_flags+0xa3/0xc0
> [354765.008342]  ip6_route_me_harder+0xab/0x280
> [354765.016889]  ip6table_mangle_hook+0xd4/0x110 [ip6table_mangle]
> [354765.028754]  ? nf_hook_slow+0x43/0xc0
> [354765.036269]  nf_hook_slow+0x43/0xc0
> [354765.043445]  nf_hook+0x6e/0xc0
> [354765.049731]  ? ac6_proc_exit+0x20/0x20
> [354765.057412]  ip6_xmit+0x28a/0x500
> [354765.064225]  ? ac6_proc_exit+0x20/0x20
> [354765.071902]  ? inet6_csk_route_socket+0x10f/0x1c0
> [354765.081495]  ? update_group_capacity+0x23/0x1e0
> [354765.090749]  inet6_csk_xmit+0x82/0xd0
> [354765.098277]  tcp_transmit_skb+0x51d/0x9d0
> [354765.106495]  tcp_write_xmit+0x1bd/0xf40
> [354765.114359]  ? _copy_from_iter_full+0x9c/0x240
> [354765.123444]  tcp_sendmsg_locked+0x2c2/0xdd0
> [354765.131991]  tcp_sendmsg+0x27/0x40
> [354765.138991]  sock_sendmsg+0x36/0x40
> [354765.146167]  sock_write_iter+0x84/0xd0
>
>
> Disassemble of the fib6_node_lookup_1:
> Dump of assembler code for function fib6_node_lookup_1:
>    0xffffffff818b3c70 <+0>:     callq  0xffffffff81a01610 <__fentry__>
>    0xffffffff818b3c75 <+5>:     mov    (%rsi),%eax
>    0xffffffff818b3c77 <+7>:     test   %eax,%eax
>    0xffffffff818b3c79 <+9>:     je     0xffffffff818b3d94 <fib6_node_lookup_1+292>
>    0xffffffff818b3c7f <+15>:    push   %r12
>    0xffffffff818b3c81 <+17>:    push   %rbp
>    0xffffffff818b3c82 <+18>:    mov    %rsi,%rbp
>    0xffffffff818b3c85 <+21>:    push   %rbx
>    0xffffffff818b3c86 <+22>:    mov    %rdi,%rbx
>    0xffffffff818b3c89 <+25>:    mov    0x8(%rsi),%rdi
>    0xffffffff818b3c8d <+29>:    mov    $0x1,%esi
>    0xffffffff818b3c92 <+34>:    movzwl 0x28(%rbx),%ecx
>    0xffffffff818b3c96 <+38>:    mov    %esi,%edx
>    0xffffffff818b3c98 <+40>:    mov    %ecx,%eax
>    0xffffffff818b3c9a <+42>:    xor    $0xffffffe7,%ecx
>    0xffffffff818b3c9d <+45>:    sar    $0x5,%eax
>    0xffffffff818b3ca0 <+48>:    shl    %cl,%edx
>    0xffffffff818b3ca2 <+50>:    cltq
>    0xffffffff818b3ca4 <+52>:    test   %edx,(%rdi,%rax,4)
>    0xffffffff818b3ca7 <+55>:    je     0xffffffff818b3cb7 <fib6_node_lookup_1+71>
>    0xffffffff818b3ca9 <+57>:    mov    0x10(%rbx),%rax
>    0xffffffff818b3cad <+61>:    test   %rax,%rax
>    0xffffffff818b3cb0 <+64>:    je     0xffffffff818b3cc0 <fib6_node_lookup_1+80>
>    0xffffffff818b3cb2 <+66>:    mov    %rax,%rbx
>    0xffffffff818b3cb5 <+69>:    jmp    0xffffffff818b3c92 <fib6_node_lookup_1+34>
>    0xffffffff818b3cb7 <+71>:    mov    0x8(%rbx),%rax
>    0xffffffff818b3cbb <+75>:    test   %rax,%rax
>    0xffffffff818b3cbe <+78>:    jne    0xffffffff818b3cb2 <fib6_node_lookup_1+66>
>    0xffffffff818b3cc0 <+80>:    test   %rbx,%rbx
>    0xffffffff818b3cc3 <+83>:    je     0xffffffff818b3d17 <fib6_node_lookup_1+167>
>    0xffffffff818b3cc5 <+85>:    mov    $0xffffffffffffffff,%r12
>    0xffffffff818b3ccc <+92>:    jmp    0xffffffff818b3d02 <fib6_node_lookup_1+146>
>    0xffffffff818b3cce <+94>:    mov    0x20(%rbx),%rax
>    0xffffffff818b3cd2 <+98>:    test   %rax,%rax
>    0xffffffff818b3cd5 <+101>:   je     0xffffffff818b3cf2 <fib6_node_lookup_1+130>
>    0xffffffff818b3cd7 <+103>:   movslq 0x0(%rbp),%rdx
>    0xffffffff818b3cdb <+107>:   mov    0x8(%rbp),%rsi
>    0xffffffff818b3cdf <+111>:   add    %rdx,%rax
>    0xffffffff818b3ce2 <+114>:   mov    0x10(%rax),%edx
>    0xffffffff818b3ce5 <+117>:   cmp    $0x3f,%edx
>    0xffffffff818b3ce8 <+120>:   jbe    0xffffffff818b3d1e <fib6_node_lookup_1+174>
>    0xffffffff818b3cea <+122>:   mov    (%rsi),%rcx
>    0xffffffff818b3ced <+125>:   cmp    %rcx,(%rax)
>    0xffffffff818b3cf0 <+128>:   je     0xffffffff818b3d52 <fib6_node_lookup_1+226>
>    0xffffffff818b3cf2 <+130>:   movzwl 0x2a(%rbx),%eax
>    0xffffffff818b3cf6 <+134>:   test   $0x2,%al
>    0xffffffff818b3cf8 <+136>:   jne    0xffffffff818b3d17 <fib6_node_lookup_1+167>
>    0xffffffff818b3cfa <+138>:   mov    (%rbx),%rbx
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ