linux-kernel - Re: possible deadlock in process_one

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+aV3OoRSABorkbqxj6MEs_sDeD5vqUjhMMhyqBs4NvVPA@mail.gmail.com>
Date:   Tue, 31 Oct 2017 16:21:46 +0300
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     syzbot 
        <bot+e24d104216808d90da4c438a6f38a239217c605b@...kaller.appspotmail.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs@...glegroups.com,
        kasan-dev <kasan-dev@...glegroups.com>
Subject: Re: possible deadlock in process_one_work

Another instance reported here:
https://groups.google.com/d/msg/syzkaller-bugs/X6iDmVKBf2U/AHyJlnyaAgAJ


======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc6-next-20170824+ #8 Not tainted
------------------------------------------------------
kworker/u4:2/57 is trying to acquire lock:
  ((complete)&rcu.completion){+.+.}, at: [<ffffffff815afff5>]
__synchronize_srcu+0x1b5/0x250 kernel/rcu/srcutree.c:898

but task is already holding lock:
  (slab_mutex){+.+.}, at: [<ffffffff8192b730>] kmem_cache_destroy+0x30/0x250
mm/slab_common.c:821

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (slab_mutex){+.+.}:
        check_prevs_add kernel/locking/lockdep.c:2020 [inline]
        validate_chain kernel/locking/lockdep.c:2469 [inline]
        __lock_acquire+0x3286/0x4620 kernel/locking/lockdep.c:3498
        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
        __mutex_lock_common kernel/locking/mutex.c:756 [inline]
        __mutex_lock+0x16f/0x1870 kernel/locking/mutex.c:893
        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
        kmem_cache_create+0x39/0x2a0 mm/slab_common.c:435
        ptlock_cache_init+0x24/0x2d mm/memory.c:4632
        pgtable_init include/linux/mm.h:1756 [inline]
        mm_init init/main.c:504 [inline]
        start_kernel+0x3d4/0x7ad init/main.c:569
        x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:381
        x86_64_start_kernel+0x13c/0x149 arch/x86/kernel/head64.c:362
        verify_cpu+0x0/0xfb

-> #2 (memcg_cache_ids_sem){.+.+}:
        check_prevs_add kernel/locking/lockdep.c:2020 [inline]
        validate_chain kernel/locking/lockdep.c:2469 [inline]
        __lock_acquire+0x3286/0x4620 kernel/locking/lockdep.c:3498
        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
        down_read+0x96/0x150 kernel/locking/rwsem.c:23
        memcg_get_cache_ids+0x10/0x20 mm/memcontrol.c:274
        list_lru_destroy+0x96/0x490 mm/list_lru.c:573
        deactivate_locked_super+0x94/0xd0 fs/super.c:315
        deactivate_super+0x141/0x1b0 fs/super.c:339
        cleanup_mnt+0xb2/0x150 fs/namespace.c:1113
        mntput_no_expire+0x6e0/0xa90 fs/namespace.c:1179
        mntput fs/namespace.c:1189 [inline]
        kern_unmount+0x9c/0xd0 fs/namespace.c:2934
        pid_ns_release_proc+0x37/0x50 fs/proc/root.c:231
        proc_cleanup_work+0x19/0x20 kernel/pid_namespace.c:79
        process_one_work+0xbfd/0x1be0 kernel/workqueue.c:2098
        worker_thread+0x223/0x1860 kernel/workqueue.c:2233
        kthread+0x39c/0x470 kernel/kthread.c:231
        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

-> #1 ((&ns->proc_work)){+.+.}:
        process_one_work+0xba5/0x1be0 kernel/workqueue.c:2095
        worker_thread+0x223/0x1860 kernel/workqueue.c:2233
        kthread+0x39c/0x470 kernel/kthread.c:231
        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
        0xffffffffffffffff

-> #0 ((complete)&rcu.completion){+.+.}:
        check_prev_add+0x865/0x1520 kernel/locking/lockdep.c:1894
        check_prevs_add kernel/locking/lockdep.c:2020 [inline]
        validate_chain kernel/locking/lockdep.c:2469 [inline]
        __lock_acquire+0x3286/0x4620 kernel/locking/lockdep.c:3498
        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
        complete_acquire include/linux/completion.h:39 [inline]
        __wait_for_common kernel/sched/completion.c:108 [inline]
        wait_for_common kernel/sched/completion.c:122 [inline]
        wait_for_completion+0xc8/0x770 kernel/sched/completion.c:143
        __synchronize_srcu+0x1b5/0x250 kernel/rcu/srcutree.c:898
        synchronize_srcu_expedited kernel/rcu/srcutree.c:923 [inline]
        synchronize_srcu+0x1a3/0x560 kernel/rcu/srcutree.c:974
        quarantine_remove_cache+0xd7/0xf0 mm/kasan/quarantine.c:327
        kasan_cache_shutdown+0x9/0x10 mm/kasan/kasan.c:381
        shutdown_cache+0x15/0x1b0 mm/slab_common.c:531
        kmem_cache_destroy+0x236/0x250 mm/slab_common.c:829
        tipc_server_stop+0x13f/0x190 net/tipc/server.c:636
        tipc_topsrv_stop+0x1fe/0x350 net/tipc/subscr.c:390
        tipc_exit_net+0x15/0x40 net/tipc/core.c:96
        ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:142
        cleanup_net+0x5c7/0xb60 net/core/net_namespace.c:483
        process_one_work+0xbfd/0x1be0 kernel/workqueue.c:2098
        worker_thread+0x223/0x1860 kernel/workqueue.c:2233
        kthread+0x39c/0x470 kernel/kthread.c:231
        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

other info that might help us debug this:

Chain exists of:
   (complete)&rcu.completion --> memcg_cache_ids_sem --> slab_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slab_mutex);
                                lock(memcg_cache_ids_sem);
                                lock(slab_mutex);
   lock((complete)&rcu.completion);

  *** DEADLOCK ***

5 locks held by kworker/u4:2/57:
  #0:  ("%s""netns"){.+.+}, at: [<ffffffff81464534>] __write_once_size
include/linux/compiler.h:305 [inline]
  #0:  ("%s""netns"){.+.+}, at: [<ffffffff81464534>] atomic64_set
arch/x86/include/asm/atomic64_64.h:33 [inline]
  #0:  ("%s""netns"){.+.+}, at: [<ffffffff81464534>] atomic_long_set
include/asm-generic/atomic-long.h:56 [inline]
  #0:  ("%s""netns"){.+.+}, at: [<ffffffff81464534>] set_work_data
kernel/workqueue.c:617 [inline]
  #0:  ("%s""netns"){.+.+}, at: [<ffffffff81464534>]
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
  #0:  ("%s""netns"){.+.+}, at: [<ffffffff81464534>]
process_one_work+0xad4/0x1be0 kernel/workqueue.c:2090
  #1:  (net_cleanup_work){+.+.}, at: [<ffffffff8146458c>]
process_one_work+0xb2c/0x1be0 kernel/workqueue.c:2094
  #2:  (net_mutex){+.+.}, at: [<ffffffff83e50bc7>] cleanup_net+0x247/0xb60
net/core/net_namespace.c:449
  #3:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8192b722>]
get_online_cpus include/linux/cpu.h:126 [inline]
  #3:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8192b722>]
kmem_cache_destroy+0x22/0x250 mm/slab_common.c:818
  #4:  (slab_mutex){+.+.}, at: [<ffffffff8192b730>]
kmem_cache_destroy+0x30/0x250 mm/slab_common.c:821

stack backtrace:
CPU: 1 PID: 57 Comm: kworker/u4:2 Not tainted 4.13.0-rc6-next-20170824+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
  __dump_stack lib/dump_stack.c:16 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:52
  print_circular_bug+0x503/0x710 kernel/locking/lockdep.c:1259
  check_prev_add+0x865/0x1520 kernel/locking/lockdep.c:1894
  check_prevs_add kernel/locking/lockdep.c:2020 [inline]
  validate_chain kernel/locking/lockdep.c:2469 [inline]
  __lock_acquire+0x3286/0x4620 kernel/locking/lockdep.c:3498
  lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
  complete_acquire include/linux/completion.h:39 [inline]
  __wait_for_common kernel/sched/completion.c:108 [inline]
  wait_for_common kernel/sched/completion.c:122 [inline]
  wait_for_completion+0xc8/0x770 kernel/sched/completion.c:143
  __synchronize_srcu+0x1b5/0x250 kernel/rcu/srcutree.c:898
  synchronize_srcu_expedited kernel/rcu/srcutree.c:923 [inline]
  synchronize_srcu+0x1a3/0x560 kernel/rcu/srcutree.c:974
  quarantine_remove_cache+0xd7/0xf0 mm/kasan/quarantine.c:327
  kasan_cache_shutdown+0x9/0x10 mm/kasan/kasan.c:381
  shutdown_cache+0x15/0x1b0 mm/slab_common.c:531
  kmem_cache_destroy+0x236/0x250 mm/slab_common.c:829
  tipc_server_stop+0x13f/0x190 net/tipc/server.c:636
  tipc_topsrv_stop+0x1fe/0x350 net/tipc/subscr.c:390
  tipc_exit_net+0x15/0x40 net/tipc/core.c:96
  ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:142
  cleanup_net+0x5c7/0xb60 net/core/net_namespace.c:483
  process_one_work+0xbfd/0x1be0 kernel/workqueue.c:2098
  worker_thread+0x223/0x1860 kernel/workqueue.c:2233
  kthread+0x39c/0x470 kernel/kthread.c:231
  ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

On Mon, Oct 30, 2017 at 10:34 PM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
> On Mon, Oct 30, 2017 at 10:32 PM, syzbot
> <bot+e24d104216808d90da4c438a6f38a239217c605b@...kaller.appspotmail.com>
> wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 9506597de2cde02d48c11d5c250250b9143f59f7
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>> C reproducer is attached
>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> for information about syzkaller reproducers
>>
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 4.13.0-rc6-next-20170824+ #8 Not tainted
>> ------------------------------------------------------
>> kworker/0:2/1313 is trying to acquire lock:
>>  ((shepherd).work){+.+.}, at: [<ffffffff8146458c>]
>> process_one_work+0xb2c/0x1be0 kernel/workqueue.c:2094
>>
>> but now in release context of a crosslock acquired at the following:
>>  ((complete)&rcu.completion){+.+.}, at: [<ffffffff815afff5>]
>> __synchronize_srcu+0x1b5/0x250 kernel/rcu/srcutree.c:898
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #2 ((complete)&rcu.completion){+.+.}:
>>        check_prevs_add kernel/locking/lockdep.c:2020 [inline]
>>        validate_chain kernel/locking/lockdep.c:2469 [inline]
>>        __lock_acquire+0x3286/0x4620 kernel/locking/lockdep.c:3498
>>        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
>>        complete_acquire include/linux/completion.h:39 [inline]
>>        __wait_for_common kernel/sched/completion.c:108 [inline]
>>        wait_for_common kernel/sched/completion.c:122 [inline]
>>        wait_for_completion+0xc8/0x770 kernel/sched/completion.c:143
>>        __synchronize_srcu+0x1b5/0x250 kernel/rcu/srcutree.c:898
>>        synchronize_srcu_expedited kernel/rcu/srcutree.c:923 [inline]
>>        synchronize_srcu+0x1a3/0x560 kernel/rcu/srcutree.c:974
>
> This looks like an issue with KASAN, which unexpectedly calls
> synchronize_srcu from kmem_cache_shrink.
> So +kasan-dev, Tejun, Lai to BCC.
>
>
>>        quarantine_remove_cache+0xd7/0xf0 mm/kasan/quarantine.c:327
>>        kasan_cache_shrink+0x9/0x10 mm/kasan/kasan.c:380
>>        kmem_cache_shrink+0x15/0x30 mm/slab_common.c:857
>>        acpi_os_purge_cache+0x15/0x20 drivers/acpi/osl.c:1560
>>        acpi_purge_cached_objects+0x38/0xc9 drivers/acpi/acpica/utxface.c:271
>>        acpi_initialize_objects+0xc5/0x112 drivers/acpi/acpica/utxfinit.c:302
>>        acpi_bus_init drivers/acpi/bus.c:1131 [inline]
>>        acpi_init+0x23c/0x8e6 drivers/acpi/bus.c:1220
>>        do_one_initcall+0x9e/0x330 init/main.c:826
>>        do_initcall_level init/main.c:892 [inline]
>>        do_initcalls init/main.c:900 [inline]
>>        do_basic_setup init/main.c:918 [inline]
>>        kernel_init_freeable+0x469/0x521 init/main.c:1066
>>        kernel_init+0x13/0x172 init/main.c:993
>>        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>
>> -> #1 (cpu_hotplug_lock.rw_sem){++++}:
>>        check_prevs_add kernel/locking/lockdep.c:2020 [inline]
>>        validate_chain kernel/locking/lockdep.c:2469 [inline]
>>        __lock_acquire+0x3286/0x4620 kernel/locking/lockdep.c:3498
>>        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
>>        percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:35
>> [inline]
>>        percpu_down_read include/linux/percpu-rwsem.h:58 [inline]
>>        cpus_read_lock+0x42/0x90 kernel/cpu.c:218
>>        get_online_cpus include/linux/cpu.h:126 [inline]
>>        vmstat_shepherd+0x3d/0x1b0 mm/vmstat.c:1707
>>        process_one_work+0xbfd/0x1be0 kernel/workqueue.c:2098
>>        worker_thread+0x223/0x1860 kernel/workqueue.c:2233
>>        kthread+0x39c/0x470 kernel/kthread.c:231
>>        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>
>> -> #0 ((shepherd).work){+.+.}:
>>        process_one_work+0xba5/0x1be0 kernel/workqueue.c:2095
>>        worker_thread+0x223/0x1860 kernel/workqueue.c:2233
>>        kthread+0x39c/0x470 kernel/kthread.c:231
>>        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>        0xffffffffffffffff
>>
>> other info that might help us debug this:
>>
>> Chain exists of:
>>   (shepherd).work --> cpu_hotplug_lock.rw_sem --> (complete)&rcu.completion
>>
>>  Possible unsafe locking scenario by crosslock:
>>
>>        CPU0                    CPU1
>>        ----                    ----
>>   lock(cpu_hotplug_lock.rw_sem);
>>   lock((complete)&rcu.completion);
>>                                lock((shepherd).work);
>>                                unlock((complete)&rcu.completion);
>>
>>  *** DEADLOCK ***
>>
>> 3 locks held by kworker/0:2/1313:
>>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
>> __write_once_size include/linux/compiler.h:305 [inline]
>>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
>> atomic64_set arch/x86/include/asm/atomic64_64.h:33 [inline]
>>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
>> atomic_long_set include/asm-generic/atomic-long.h:56 [inline]
>>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
>> set_work_data kernel/workqueue.c:617 [inline]
>>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
>> set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
>>  #0:  ("events_power_efficient"){.+.+}, at: [<ffffffff81464534>]
>> process_one_work+0xad4/0x1be0 kernel/workqueue.c:2090
>>  #1:  ((&(&sdp->work)->work)){+.+.}, at: [<ffffffff8146458c>]
>> process_one_work+0xb2c/0x1be0 kernel/workqueue.c:2094
>>  #2:  (&x->wait#5){....}, at: [<ffffffff81524c68>] complete+0x18/0x80
>> kernel/sched/completion.c:34
>>
>> stack backtrace:
>> CPU: 0 PID: 1313 Comm: kworker/0:2 Not tainted 4.13.0-rc6-next-20170824+ #8
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Workqueue: events_power_efficient srcu_invoke_callbacks
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:16 [inline]
>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>  print_circular_bug+0x503/0x710 kernel/locking/lockdep.c:1259
>>  check_prev_add+0x865/0x1520 kernel/locking/lockdep.c:1894
>>  commit_xhlock kernel/locking/lockdep.c:5002 [inline]
>>  commit_xhlocks kernel/locking/lockdep.c:5046 [inline]
>>  lock_commit_crosslock+0xe73/0x1d10 kernel/locking/lockdep.c:5085
>>  complete_release_commit include/linux/completion.h:49 [inline]
>>  complete+0x24/0x80 kernel/sched/completion.c:39
>>  wakeme_after_rcu+0xd/0x10 kernel/rcu/update.c:376
>>  srcu_invoke_callbacks+0x280/0x4d0 kernel/rcu/srcutree.c:1161
>>  process_one_work+0xbfd/0x1be0 kernel/workqueue.c:2098
>>  worker_thread+0x223/0x1860 kernel/workqueue.c:2233
>>  kthread+0x39c/0x470 kernel/kthread.c:231
>>  ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>
>>
>> ---
>> This bug is generated by a dumb bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for details.
>> Direct all questions to syzkaller@...glegroups.com.
>>
>> syzbot will keep track of this bug report.
>> Once a fix for this bug is committed, please reply to this email with:
>> #syz fix: exact-commit-title
>> To mark this as a duplicate of another syzbot report, please reply with:
>> #syz dup: exact-subject-of-another-report
>> If it's a one-off invalid bug report, please reply with:
>> #syz invalid
>> Note: if the crash happens again, it will cause creation of a new bug
>> report.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "syzkaller-bugs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to syzkaller-bugs+unsubscribe@...glegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/syzkaller-bugs/001a11404e22a6cfdd055cc8b450%40google.com.
>> For more options, visit https://groups.google.com/d/optout.