netdev - Re: WARNING: bad unlock balance in sch_direct

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMArcTVPrKrhY63P=VgFuTQf0wUNO_9=H2R96p08-xoJ+mbZ5w@mail.gmail.com>
Date:   Wed, 8 Jan 2020 20:42:58 +0900
From:   Taehee Yoo <ap420073@...il.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     syzbot <syzbot+4ec99438ed7450da6272@...kaller.appspotmail.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: WARNING: bad unlock balance in sch_direct_xmit

On Wed, 8 Jan 2020 at 09:34, Cong Wang <xiyou.wangcong@...il.com> wrote:
>
> On Tue, Jan 7, 2020 at 3:31 AM Taehee Yoo <ap420073@...il.com> wrote:
> > After "ip link set team0 master team1", the "team1 -> team0" locking path
> > will be recorded in lockdep key of both team1 and team0.
> > Then, if "ip link set team1 master team0" is executed, "team0 -> team1"
> > locking path also will be recorded in lockdep key. At this moment,
> > lockdep will catch possible deadlock situation and it prints the above
> > warning message. But, both "team0 -> team1" and "team1 -> team0"
> > will not be existing concurrently. so the above message is actually wrong.
> > In order to avoid this message, a recorded locking path should be
> > removed. So, both lockdep_unregister_key() and lockdep_register_key()
> > are needed.
> >
>
> So, after you move the key down to each netdevice, they are now treated
> as different locks. Is this stacked device scenario the reason why you
> move it to per-netdevice? If so, I wonder why not just use nested locks?
> Like:
>
> netif_addr_nested_lock(upper, 0);
> netif_addr_nested_lock(lower, 1);
> netif_addr_nested_unlock(lower);
> netif_addr_nested_unlock(upper);
>
> For this case, they could still share a same key.
>
> Thanks for the details!

Yes, the reason for using dynamic lockdep key is to avoid lockdep
warning in stacked device scenario.
But, the addr_list_lock case is a little bit different.
There was a bug in netif_addr_lock_nested() that
"dev->netdev_ops->ndo_get_lock_subclass" isn't updated after "master"
and "nomaster" command.
So, the wrong subclass is used, so lockdep warning message was printed.
There were some ways to fix this problem, using dynamic key is just one
of them. I think using the correct subclass in netif_addr_lock_nested()
is also a correct way to fix that problem. Another minor reason was that
the subclass is limited by 8. but dynamic key has no limitation.

Unfortunately, dynamic key has a problem too.
lockdep limits the maximum number of lockdep keys.
   $cat /proc/lockdep_stats
   lock-classes:                         1228 [max: 8192]

So, If so many network interfaces are created, they use so many
lockdep keys. If so, lockdep will stop.
This is the cause of "BUG: MAX_LOCKDEP_KEYS too low!".

Thank you!
Taehee Yoo