linux-kernel - Re: possible deadlock in process_one

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Yas5-MuxuUM9fYZM6n_sE6VsHCuB=46_pFaDYmX_4nNg@mail.gmail.com>
Date:   Mon, 30 Oct 2017 22:34:47 +0300
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     syzbot 
        <bot+e24d104216808d90da4c438a6f38a239217c605b@...kaller.appspotmail.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs@...glegroups.com,
        kasan-dev <kasan-dev@...glegroups.com>
Subject: Re: possible deadlock in process_one_work

On Mon, Oct 30, 2017 at 10:32 PM, syzbot
<bot+e24d104216808d90da4c438a6f38a239217c605b@...kaller.appspotmail.com>
wrote:
> Hello,
>
> syzkaller hit the following crash on
> 9506597de2cde02d48c11d5c250250b9143f59f7
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
>
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.13.0-rc6-next-20170824+ #8 Not tainted
> ------------------------------------------------------
> kworker/0:2/1313 is trying to acquire lock:
>  ((shepherd).work){+.+.}, at: [<ffffffff8146458c>]
> process_one_work+0xb2c/0x1be0 kernel/workqueue.c:2094
>
> but now in release context of a crosslock acquired at the following:
>  ((complete)&rcu.completion){+.+.}, at: [<ffffffff815afff5>]
> __synchronize_srcu+0x1b5/0x250 kernel/rcu/srcutree.c:898
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 ((complete)&rcu.completion){+.+.}:
>        check_prevs_add kernel/locking/lockdep.c:2020 [inline]
>        validate_chain kernel/locking/lockdep.c:2469 [inline]
>        __lock_acquire+0x3286/0x4620 kernel/locking/lockdep.c:3498
>        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
>        complete_acquire include/linux/completion.h:39 [inline]
>        __wait_for_common kernel/sched/completion.c:108 [inline]
>        wait_for_common kernel/sched/completion.c:122 [inline]
>        wait_for_completion+0xc8/0x770 kernel/sched/completion.c:143
>        __synchronize_srcu+0x1b5/0x250 kernel/rcu/srcutree.c:898
>        synchronize_srcu_expedited kernel/rcu/srcutree.c:923 [inline]
>        synchronize_srcu+0x1a3/0x560 kernel/rcu/srcutree.c:974

This looks like an issue with KASAN, which unexpectedly calls
synchronize_srcu from kmem_cache_shrink.
So +kasan-dev, Tejun, Lai to BCC.


>        quarantine_remove_cache+0xd7/0xf0 mm/kasan/quarantine.c:327
>        kasan_cache_shrink+0x9/0x10 mm/kasan/kasan.c:380
>        kmem_cache_shrink+0x15/0x30 mm/slab_common.c:857
>        acpi_os_purge_cache+0x15/0x20 drivers/acpi/osl.c:1560
>        acpi_purge_cached_objects+0x38/0xc9 drivers/acpi/acpica/utxface.c:271
>        acpi_initialize_objects+0xc5/0x112 drivers/acpi/acpica/utxfinit.c:302
>        acpi_bus_init drivers/acpi/bus.c:1131 [inline]
>        acpi_init+0x23c/0x8e6 drivers/acpi/bus.c:1220
>        do_one_initcall+0x9e/0x330 init/main.c:826
>        do_initcall_level init/main.c:892 [inline]
>        do_initcalls init/main.c:900 [inline]
>        do_basic_setup init/main.c:918 [inline]
>        kernel_init_freeable+0x469/0x521 init/main.c:1066
>        kernel_init+0x13/0x172 init/main.c:993
>        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>
> -> #1 (cpu_hotplug_lock.rw_sem){++++}:
>        check_prevs_add kernel/locking/lockdep.c:2020 [inline]
>        validate_chain kernel/locking/lockdep.c:2469 [inline]
>        __lock_acquire+0x3286/0x4620 kernel/locking/lockdep.c:3498
>        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
>        percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:35
> [inline]
>        percpu_down_read include/linux/percpu-rwsem.h:58 [inline]
>        cpus_read_lock+0x42/0x90 kernel/cpu.c:218
>        get_online_cpus include/linux/cpu.h:126 [inline]
>        vmstat_shepherd+0x3d/0x1b0 mm/vmstat.c:1707
>        process_one_work+0xbfd/0x1be0 kernel/workqueue.c:2098
>        worker_thread+0x223/0x1860 kernel/workqueue.c:2233
>        kthread+0x39c/0x470 kernel/kthread.c:231
>        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>
> -> #0 ((shepherd).work){+.+.}:
>        process_one_work+0xba5/0x1be0 kernel/workqueue.c:2095
>        worker_thread+0x223/0x1860 kernel/workqueue.c:2233
>        kthread+0x39c/0x470 kernel/kthread.c:231
>        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>        0xffffffffffffffff
>
> other info that might help us debug this:
>
> Chain exists of:
>   (shepherd).work --> cpu_hotplug_lock.rw_sem --> (complete)&rcu.completion
>
>  Possible unsafe locking scenario by crosslock:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(cpu_hotplug_lock.rw_sem);
>   lock((complete)&rcu.completion);
>                                lock((shepherd).work);
>                                unlock((complete)&rcu.completion);
>
>  *** DEADLOCK ***
>
> 3 locks held by kworker/0:2/1313:
>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
> __write_once_size include/linux/compiler.h:305 [inline]
>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
> atomic64_set arch/x86/include/asm/atomic64_64.h:33 [inline]
>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
> atomic_long_set include/asm-generic/atomic-long.h:56 [inline]
>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
> set_work_data kernel/workqueue.c:617 [inline]
>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
> set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
> process_one_work+0xad4/0x1be0 kernel/workqueue.c:2090
>  #1:  ((&(&sdp->work)->work)){+.+.}, at: [<ffffffff8146458c>]
> process_one_work+0xb2c/0x1be0 kernel/workqueue.c:2094
>  #2:  (&x->wait#5){....}, at: [<ffffffff81524c68>] complete+0x18/0x80
> kernel/sched/completion.c:34
>
> stack backtrace:
> CPU: 0 PID: 1313 Comm: kworker/0:2 Not tainted 4.13.0-rc6-next-20170824+ #8
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Workqueue: events_power_efficient srcu_invoke_callbacks
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>  print_circular_bug+0x503/0x710 kernel/locking/lockdep.c:1259
>  check_prev_add+0x865/0x1520 kernel/locking/lockdep.c:1894
>  commit_xhlock kernel/locking/lockdep.c:5002 [inline]
>  commit_xhlocks kernel/locking/lockdep.c:5046 [inline]
>  lock_commit_crosslock+0xe73/0x1d10 kernel/locking/lockdep.c:5085
>  complete_release_commit include/linux/completion.h:49 [inline]
>  complete+0x24/0x80 kernel/sched/completion.c:39
>  wakeme_after_rcu+0xd/0x10 kernel/rcu/update.c:376
>  srcu_invoke_callbacks+0x280/0x4d0 kernel/rcu/srcutree.c:1161
>  process_one_work+0xbfd/0x1be0 kernel/workqueue.c:2098
>  worker_thread+0x223/0x1860 kernel/workqueue.c:2233
>  kthread+0x39c/0x470 kernel/kthread.c:231
>  ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@...glegroups.com.
>
> syzbot will keep track of this bug report.
> Once a fix for this bug is committed, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@...glegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a11404e22a6cfdd055cc8b450%40google.com.
> For more options, visit https://groups.google.com/d/optout.