[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f95d746-27d0-a7c9-9cff-0cc60b7c1c73@gmail.com>
Date: Tue, 8 Sep 2020 11:41:13 -0600
From: David Ahern <dsahern@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
Ben Greear <greearb@...delatech.com>
Subject: Re: [PATCH net] ipv6: avoid lockdep issue in fib6_del()
On 9/8/20 11:06 AM, Eric Dumazet wrote:
> On Tue, Sep 8, 2020 at 6:50 PM David Ahern <dsahern@...il.com> wrote:
>>
>> On 9/8/20 2:20 AM, Eric Dumazet wrote:
>>> syzbot reported twice a lockdep issue in fib6_del() [1]
>>> which I think is caused by net->ipv6.fib6_null_entry
>>> having a NULL fib6_table pointer.
>>>
>>> fib6_del() already checks for fib6_null_entry special
>>> case, we only need to return earlier.
>>>
>>> Bug seems to occur very rarely, I have thus chosen
>>> a 'bug origin' that makes backports not too complex.
>>>
>>> [1]
>>> WARNING: suspicious RCU usage
>>> 5.9.0-rc4-syzkaller #0 Not tainted
>>> -----------------------------
>>> net/ipv6/ip6_fib.c:1996 suspicious rcu_dereference_protected() usage!
>>>
>>> other info that might help us debug this:
>>>
>>> rcu_scheduler_active = 2, debug_locks = 1
>>> 4 locks held by syz-executor.5/8095:
>>> #0: ffffffff8a7ea708 (rtnl_mutex){+.+.}-{3:3}, at: ppp_release+0x178/0x240 drivers/net/ppp/ppp_generic.c:401
>>> #1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: spin_trylock_bh include/linux/spinlock.h:414 [inline]
>>> #1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: fib6_run_gc+0x21b/0x2d0 net/ipv6/ip6_fib.c:2312
>>> #2: ffffffff89bd6a40 (rcu_read_lock){....}-{1:2}, at: __fib6_clean_all+0x0/0x290 net/ipv6/ip6_fib.c:2613
>>> #3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:359 [inline]
>>> #3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: __fib6_clean_all+0x107/0x290 net/ipv6/ip6_fib.c:2245
>>>
>>> stack backtrace:
>>> CPU: 1 PID: 8095 Comm: syz-executor.5 Not tainted 5.9.0-rc4-syzkaller #0
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>> Call Trace:
>>> __dump_stack lib/dump_stack.c:77 [inline]
>>> dump_stack+0x198/0x1fd lib/dump_stack.c:118
>>> fib6_del+0x12b4/0x1630 net/ipv6/ip6_fib.c:1996
>>> fib6_clean_node+0x39b/0x570 net/ipv6/ip6_fib.c:2180
>>> fib6_walk_continue+0x4aa/0x8e0 net/ipv6/ip6_fib.c:2102
>>> fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2150
>>> fib6_clean_tree+0xdb/0x120 net/ipv6/ip6_fib.c:2230
>>> __fib6_clean_all+0x120/0x290 net/ipv6/ip6_fib.c:2246
>>
>> This is walking a table and __fib6_clean_all takes the lock for the
>> table (and you can see that above), so puzzling how fib6_del can be
>> called for an entry with NULL fib6_table.
>
> So you think the test for (rt == net->ipv6.fib6_null_entry)
> should be replaced by
>
> BUG_ON(rt == net->ipv6.fib6_null_entry); ?
>
BUG_ON does not seem right.
Backing out to the callers, why does fib6_clean_node not catch that it
is the root of the table and abort the walk or at least not try to
remove the root? This might be related to the problem Ben has complained
about many times.
If syzbot has only triggered it a few times then I presume no reproducer.
Powered by blists - more mailing lists