linux-kernel - Re: BUG: MAX_LOCKDEP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Y9knRot2GUG706AK68XzrOwHpC6EhK0xVFFNKX=V4q7A@mail.gmail.com>
Date:   Tue, 14 Jan 2020 11:18:55 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     syzbot <syzbot+aaa6fa4949cc5d9b7b25@...kaller.appspotmail.com>,
        Ingo Molnar <mingo@...nel.org>,
        Will Deacon <will.deacon@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: BUG: MAX_LOCKDEP_CHAINS too low!

On Tue, Jan 14, 2020 at 9:43 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Jan 09, 2020 at 11:59:25AM +0100, Dmitry Vyukov wrote:
>
> > Or are there some ID leaks in lockdep? syzbot has a bunch of very
> > simple reproducers for these bugs, so not really a maximally diverse
> > load. And I think I saw these bugs massively when testing just a
> > single subsystem too, e.g. netfilter.
>
> Can you share me one of the simple ones? A .c files I can run on my
> regular test box that should make it go *splat* ?
>
> Often in the past hitting these limits was the result of some
> particularly poor annotation.
>
> For instance, locks in per-cpu data used to trigger this, since
> static locks don't need explicit {mutex,spin_lock}_init() calls and
> instead use their (static) address. This worked fine for global state,
> but per-cpu is an exception, there it causes a nr_cpus explosion in
> lockdep state because you get nr_cpus different addresses.
>
> Now, we fixed that particular issue:
>
>   383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly")
>
> but maybe there's something else going on.
>
> Just blindly bumping the number without analysis of what exactly is
> happening is never a good idea.


Each of these has a fair amount of C reproducers:

BUG: MAX_LOCKDEP_KEYS too low!
https://syzkaller.appspot.com/bug?id=8a18efe79140782a88dcd098808d6ab20ed740cc

BUG: MAX_LOCKDEP_ENTRIES too low!
https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b

BUG: MAX_LOCKDEP_CHAINS too low!
https://syzkaller.appspot.com/bug?id=bf037f4725d40a8d350b2b1b3b3e0947c6efae85

BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
https://syzkaller.appspot.com/bug?id=381cb436fe60dc03d7fd2a092b46d7f09542a72a


I just took the latest one for one of the bugs:
https://syzkaller.appspot.com/text?tag=ReproC&x=15a096aee00000
run on next-20200114 with this config:
https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-kasan.config
and got:

root@...kaller:~# ./a.out
[  110.512842][ T6486] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
[  110.518638][ T6486] turning off the locking correctness validator.
[  110.520779][ T6486] CPU: 3 PID: 6486 Comm: kworker/u9:1 Not tainted
5.5.0-rc6-next-20200114 #1
[  110.522496][ T6486] Hardware name: QEMU Standard PC (Q35 + ICH9,
2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebu4
[  110.524367][ T6486] Workqueue: hci0 hci_power_on
[  110.525126][ T6486] Call Trace:
[  110.525633][ T6486]  dump_stack+0x199/0x216
[  110.526294][ T6486]  __lock_acquire.cold.64+0x18/0x385
[  110.527084][ T6486]  ? lock_downgrade+0x900/0x900
[  110.527942][ T6486]  ? __kasan_check_write+0x14/0x20
[  110.528716][ T6486]  ? do_raw_spin_lock+0x132/0x2e0
[  110.529499][ T6486]  ? mark_held_locks+0x130/0x130
[  110.530284][ T6486]  ? kfree_skbmem+0xfb/0x1c0
[  110.531020][ T6486]  ? kfree_skbmem+0xfb/0x1c0
[  110.531748][ T6486]  ? hci_sock_dev_event+0x335/0x580
[  110.532578][ T6486]  lock_acquire+0x194/0x410
[  110.533305][ T6486]  ? skb_dequeue+0x22/0x180
[  110.534028][ T6486]  _raw_spin_lock_irqsave+0x99/0xd0
[  110.534873][ T6486]  ? skb_dequeue+0x22/0x180
[  110.535581][ T6486]  skb_dequeue+0x22/0x180
[  110.536263][ T6486]  skb_queue_purge+0x26/0x40
[  110.536996][ T6486]  ? vhci_open+0x310/0x310
[  110.537794][ T6486]  vhci_flush+0x3b/0x50
[  110.538481][ T6486]  hci_dev_do_close+0x5c8/0x1070
[  110.539297][ T6486]  ? hci_dev_open+0x290/0x290
[  110.540068][ T6486]  hci_power_on+0x1d8/0x690
[  110.540805][ T6486]  ? hci_error_reset+0xf0/0xf0
[  110.541603][ T6486]  ? rcu_read_lock_sched_held+0x9c/0xd0
[  110.542504][ T6486]  ? rcu_read_lock_any_held.part.11+0x50/0x50
[  110.543478][ T6486]  ? trace_hardirqs_on+0x67/0x240
[  110.544300][ T6486]  process_one_work+0x9b5/0x1750
[  110.545138][ T6486]  ? pwq_dec_nr_in_flight+0x320/0x320
[  110.545976][ T6486]  ? lock_acquire+0x194/0x410
[  110.546708][ T6486]  worker_thread+0x8b/0xd20
[  110.547414][ T6486]  ? process_one_work+0x1750/0x1750
[  110.548317][ T6486]  kthread+0x365/0x450
[  110.548960][ T6486]  ? kthread_mod_delayed_work+0x1b0/0x1b0
[  110.549867][ T6486]  ret_from_fork+0x24/0x30


There are probably some clusters of reproducers, I see several that
use /dev/vhci. But there may be other types of reproducers
(potentially for different issues).