lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 12 Oct 2017 16:01:48 -0400
From:   Tyler Hall <tylerwhall@...il.com>
To:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Cc:     Nicolai Stange <nicstange@...il.com>,
        Johannes Berg <johannes@...solutions.net>,
        "Paul E.McKenney" <paulmck@...ux.vnet.ibm.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: deadlock in debugfs synchronize_srcu() when unplugging USB

Hi,

I have a reproducible scenario wherein removing a USB device while
reading /sys/kernel/debug/usb/devices causes a deadlock. This should
not be specific to any USB device. Any USB device removal that causes
a call to debugfs_remove() has inverted lock ordering with respect to
the read() of debug/usb/devices.

e.g.
read thread: srcu_read_lock(&debugfs_srcu);
-- usb unplug --
  remove thread: mutex_lock(&usb_bus_idr_lock);
  remove thread: synchronize_srcu(&debugfs_srcu); <- blocked
read thread: mutex_lock(&usb_bus_idr_lock); <- blocked
read thread: srcu_read_unlock(&debugfs_srcu, ...);

This seems to be another flavor of what Johannes Berg reported:
deadlock in synchronize_srcu() in debugfs?
https://lkml.org/lkml/2017/3/23/415

I applied this patch set from Nicolai Stange and can no longer
reproduce the hang.
[RFC PATCH v2 0/9] debugfs: per-file removal protection
https://lkml.org/lkml/2017/5/3/292

As patch 2/9 in the series indicates, commit 49d200deaa68 ("debugfs:
prevent access to removed files' private data") is where this was
first introduced, and it is reproducible on v4.14-rc4.

How should we move forward with the resolution of this debugfs change?
It seems to me that the USB locking is reasonable but the debugfs
global srcu is overly restrictive. This could lead to unexpected lock
inversion any time a driver shares a mutex between its debugfs read
and removal paths.

Backtrace below. Thanks!

-Tyler Hall

This is easier to reproduce by adding a sleep before the
usb_bus_idr_lock, but I've seen it on an unmodified kernel.

diff --git a/drivers/usb/core/devices.c b/drivers/usb/core/devices.c
index 55dea2e7828f..534650cd0950 100644
--- a/drivers/usb/core/devices.c
+++ b/drivers/usb/core/devices.c
@@ -614,6 +614,7 @@ static ssize_t usb_device_read(struct file *file,
char __user *buf,
     if (!access_ok(VERIFY_WRITE, buf, nbytes))
         return -EFAULT;

+    msleep(1000);
     mutex_lock(&usb_bus_idr_lock);
     /* print devices for all busses */
     idr_for_each_entry(&usb_bus_idr, bus, id) {


[   24.240542] sysrq: SysRq : Show Blocked State
[   24.240765]   task                        PC stack   pid father
[   24.240975] kworker/0:2     D13840   881      2 0x80000000
[   24.241525] Workqueue: usb_hub_wq hub_event
[   24.241682] Call Trace:
[   24.242273]  __schedule+0x317/0x6d0
[   24.242442]  schedule+0x31/0x80
[   24.242514]  schedule_timeout+0x1d0/0x320
[   24.242603]  ? __queue_work+0x135/0x400
[   24.242689]  wait_for_completion+0x92/0xf0
[   24.242765]  ? wait_for_completion+0x92/0xf0
[   24.242841]  ? wake_up_q+0x70/0x70
[   24.242907]  __synchronize_srcu.part.14+0x71/0x90
[   24.242985]  ? trace_event_raw_event_rcu_torture_read+0xe0/0xe0
[   24.243169]  synchronize_srcu_expedited+0x22/0x30
[   24.243265]  ? synchronize_srcu_expedited+0x22/0x30
[   24.243347]  synchronize_srcu+0x9a/0xc0
[   24.243418]  debugfs_remove+0x6d/0xa0
[   24.243490]  bdi_unregister+0x8b/0x170
[   24.243558]  del_gendisk+0x139/0x220
[   24.243624]  sd_remove+0x5c/0xc0
[   24.243685]  device_release_driver_internal+0x150/0x210
[   24.243769]  device_release_driver+0xd/0x10
[   24.243841]  bus_remove_device+0xdb/0x120
[   24.243915]  device_del+0x1c3/0x2e0
[   24.243977]  __scsi_remove_device+0xff/0x130
[   24.244122]  scsi_forget_host+0x5b/0x60
[   24.244203]  scsi_remove_host+0x74/0x140
[   24.244281]  usb_stor_disconnect+0x54/0xc0
[   24.244357]  usb_unbind_interface+0x6d/0x260
[   24.244437]  device_release_driver_internal+0x150/0x210
[   24.244520]  device_release_driver+0xd/0x10
[   24.244591]  bus_remove_device+0xdb/0x120
[   24.244659]  device_del+0x1c3/0x2e0
[   24.244722]  usb_disable_device+0x97/0x1f0
[   24.244792]  usb_disconnect+0x88/0x230
[   24.244853]  hub_event+0x5b9/0x11e0
[   24.244915]  ? add_timer+0x10e/0x230
[   24.244984]  process_one_work+0x146/0x3e0
[   24.245124]  worker_thread+0x43/0x3e0
[   24.245204]  kthread+0x104/0x140
[   24.245266]  ? create_worker+0x190/0x190
[   24.245333]  ? kthread_create_on_node+0x40/0x40
[   24.245406]  ret_from_fork+0x22/0x30

[   24.245542] cat             D13712  1029   1018 0x00000000
[   24.245652] Call Trace:
[   24.245705]  __schedule+0x317/0x6d0
[   24.245770]  schedule+0x31/0x80
[   24.245830]  schedule_preempt_disabled+0x9/0x10
[   24.245903]  __mutex_lock.isra.2+0x225/0x470
[   24.245975]  __mutex_lock_slowpath+0xe/0x10
[   24.246110]  ? __mutex_lock_slowpath+0xe/0x10
[   24.246199]  mutex_lock+0x2a/0x30
[   24.246261]  usb_device_read+0xb6/0x140
[   24.246325]  full_proxy_read+0x4f/0x90
[   24.246394]  __vfs_read+0x23/0x120
[   24.246456]  ? security_file_permission+0x96/0xb0
[   24.246533]  ? rw_verify_area+0x49/0xb0
[   24.246593]  vfs_read+0x8e/0x130
[   24.246646]  SyS_read+0x41/0xa0
[   24.246698]  entry_SYSCALL_64_fastpath+0x13/0x94

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ