netdev - Re: BUG: unable to handle kernel paging request in fib6_node_lookup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6414B055-F56A-4F58-BA11-7C45F72FABCB@fb.com>
Date:   Wed, 5 Sep 2018 18:10:26 +0000
From:   Song Liu <songliubraving@...com>
To:     Wei Wang <weiwan@...gle.com>
CC:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        David Ahern <dsahern@...il.com>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: BUG: unable to handle kernel paging request in fib6_node_lookup_1



> On Sep 5, 2018, at 10:09 AM, Wei Wang <weiwan@...gle.com> wrote:
> 
> On Tue, Sep 4, 2018 at 11:11 PM Song Liu <songliubraving@...com> wrote:
>> 
>> We are debugging an issue with fib6_node_lookup_1().
>> 
>> We use a 4.16 based kernel, and we have back ported most upstream
>> patches in ip6_fib.{c.h}. The only major differences I can spot are
>> 
>> 8b7f2731bd68d83940714ce92381d1a72596407c
>> c3506372277779fccbffee2475400fcd689d5738
>> 
>> I guess the issue is not related to these two fixes.
>> 
>> After staring at the call trace and disassembly code (attached below)
>> I guess this is a use-after-free issue in (or right after) the lookup
>> loop:
>> 
>>        for (;;) {
>>                struct fib6_node *next;
>> 
>>                dir = addr_bit_set(args->addr, fn->fn_bit);
>> 
>>                next = dir ? rcu_dereference(fn->right) :
>>                             rcu_dereference(fn->left);
>> 
>>                if (next) {
>>                        fn = next;
>>                        continue;
>>                }
>>                break;
>>        }
>> 
>> I guess this probably also happens to latest upstream. I haven't
>> tested this with upstream kernel (or net tree) yet, because we
>> can only trigger this about once a week on 100 servers.
>> 
>> Does this look familiar? Any comments and/or suggestions are highly
>> appreciated.
>> 
> By glancing at the commit logs, I don't think any changes were made
> regarding the core logic of fib6_node handling recently.
> (There were a couple of fixes regarding fib6_info but I don't think it
> is the cause here... But it is still good to check if you have commit
> 9b0a8da8c4c6, e873e4b9cc7e, e70a3aad44cc in your build.)

Looks like we don't have e70a3aad44cc. I think it fixes a memory leak 
(instead of a use-after-free)? Let me add it and run some tests anyway. 
Thanks a lot for this information. 

> 
> I also went through the call path and did not find anything obviously wrong...
> I think it's the best for you to reproduce it and we can debug further.
> One question is, do you have "CONFIG_IPV6_SUBTREE" enabled and specify
> src IP in the routing table?

We do have CONFIG_IPV6_SUBTREE enabled. But we usually do not specify
src IP in the routing table. 

Let me try to reproduce it. 

Thanks again,
Song