[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ddb25ff-60e5-75be-8080-2a7465cca68c@I-love.SAKURA.ne.jp>
Date: Wed, 6 Jul 2022 15:21:15 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Greg KH <gregkh@...uxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
Len Brown <len.brown@...el.com>, Pavel Machek <pavel@....cz>,
arnd@...db.de, linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org,
Wedson Almeida Filho <wedsonaf@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable
On 2022/07/05 23:35, Tetsuo Handa wrote:
> On 2022/07/05 23:16, Greg KH wrote:
>> Some device is being probed at the moment, maybe we have a deadlock
>> somewhere here...
>
> Lockdep says __device_attach() from hub_event() was in progress.
>
> ----------------------------------------
> [ 237.376478][ T28] 5 locks held by kworker/1:1/26:
> [ 237.381526][ T28] #0: ffff888016b92538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x87a/0x1610
> [ 237.392798][ T28] #1: ffffc90000c2fda8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8ae/0x1610
> [ 237.406354][ T28] #2: ffff88801f7ee220 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4680
> [ 237.415920][ T28] #3: ffff88801b6c6220 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0
> [ 237.426682][ T28] #4: ffff8880216bc1a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0
> ----------------------------------------
>
The number returned by atomic_read(&probe_count) matched the number of works for hub_event()
in usb_hub_wq workqueue. The probe function is called from hub_event(), and
usb_stor_msg_common() calls wait_for_completion_interruptible_timeout(MAX_SCHEDULE_TIMEOUT)
via driver's init function.
But if the usb device is unresponsive, wait_for_completion_interruptible_timeout() sleeps
forever. And in this testcase (which emulates usb devices using /dev/raw-gadget interface),
the usb device became unresponsive because the process who is responsible with reading/writing
/dev/raw-gadget interface is blocked at mutex_lock(&misc_mtx) at misc_open(), and results in
an AB-BA deadlock condition. Making misc_open() killable solved this problem, by allowing
the opener of /dev/raw-gadget interface to call fput() upon "send SIGKILL after 5 seconds from
fork()" behavior.
Anyway,
/*
* Resuming. We may need to wait for the image device to
* appear.
*/
wait_for_device_probe();
in snapshot_open() will sleep forever if some device became unresponsive.
How should we fix this problem?
SysRq-t output (location is based on Linux 5.18.0):
----------------------------------------
[ 113.988455][ T4455] Calling probe_count=2
[ 115.002350][ C2] sysrq: Show State
[ 124.602603][ C2] task:kworker/2:2 state:S stack:11608 pid: 980 ppid: 2 flags:0x00004000
[ 124.613409][ C2] Workqueue: usb_hub_wq hub_event
[ 124.618941][ C2] Call Trace:
[ 124.624046][ C2] <TASK>
[ 124.629122][ C2] __schedule+0x3b6/0x1690
[ 124.634575][ C2] ? _raw_spin_unlock_irq+0x1f/0x40
[ 124.640085][ C2] schedule+0x49/0xb0
[ 124.645280][ C2] schedule_timeout+0x134/0x180
[ 124.650574][ C2] ? _raw_spin_unlock_irq+0x1f/0x40
[ 124.655880][ C2] ? lockdep_hardirqs_on+0x79/0x100
[ 124.662294][ C2] __wait_for_common+0xac/0x1f0
[ 124.667740][ C2] ? usleep_range_state+0xe0/0xe0
[ 124.673449][ C2] usb_stor_msg_common+0x157/0x1f0 usb_stor_msg_common at drivers/usb/storage/transport.c:166 (discriminator 4)
[ 124.679074][ C2] ? verify_cpu+0xf0/0x100
[ 124.684774][ C2] usb_stor_ctrl_transfer+0xc3/0xf0 usb_stor_ctrl_transfer at drivers/usb/storage/transport.c:344
[ 124.690387][ C2] usb_stor_sddr09_dpcm_init+0x63/0x1a0 sddr09_send_command at drivers/usb/storage/sddr09.c:307
(inlined by) usb_stor_sddr09_dpcm_init at drivers/usb/storage/sddr09.c:1455
[ 124.696087][ C2] ? sddr09_get_cardinfo+0x530/0x530
[ 124.701654][ C2] usb_stor_probe2+0x24f/0x470 usb_stor_acquire_resources at drivers/usb/storage/usb.c:788
(inlined by) usb_stor_probe2 at drivers/usb/storage/usb.c:1050
[ 124.707024][ C2] ? sddr09_transport+0x1940/0x1940
[ 124.712690][ C2] sddr09_probe+0xd6/0x110
[ 124.717980][ C2] usb_probe_interface+0x19b/0x3e0
[ 124.723533][ C2] ? usb_match_dynamic_id+0xe0/0xe0
[ 124.728900][ C2] really_probe+0x138/0x4c0
[ 124.734058][ C2] __driver_probe_device+0x191/0x220
[ 124.739406][ C2] driver_probe_device+0x2a/0x120
[ 124.744649][ C2] __device_attach_driver+0x105/0x1a0
[ 124.749981][ C2] ? driver_allows_async_probing+0x90/0x90
[ 124.755600][ C2] bus_for_each_drv+0xba/0x100
[ 124.761136][ C2] __device_attach+0x130/0x290
[ 124.766408][ C2] bus_probe_device+0xdb/0xf0
[ 124.771735][ C2] device_add+0x635/0xdf0
[ 124.776770][ C2] ? __mutex_unlock_slowpath+0x37/0x280
[ 124.782189][ C2] usb_set_configuration+0x9ca/0xd10
[ 124.787516][ C2] usb_generic_driver_probe+0x8c/0xc0
[ 124.792906][ C2] usb_probe_device+0x6c/0x180
[ 124.798041][ C2] ? usb_driver_release_interface+0xc0/0xc0
[ 124.803416][ C2] really_probe+0x138/0x4c0
[ 124.808329][ C2] __driver_probe_device+0x191/0x220
[ 124.813373][ C2] driver_probe_device+0x2a/0x120
[ 124.818345][ C2] __device_attach_driver+0x105/0x1a0
[ 124.823436][ C2] ? driver_allows_async_probing+0x90/0x90
[ 124.828730][ C2] bus_for_each_drv+0xba/0x100
[ 124.834151][ C2] __device_attach+0x130/0x290
[ 124.840066][ C2] bus_probe_device+0xdb/0xf0
[ 124.845475][ C2] device_add+0x635/0xdf0
[ 124.850835][ C2] usb_new_device.cold+0x110/0x5e9
[ 124.856251][ C2] hub_event+0x177c/0x26b0
[ 124.861609][ C2] process_one_work+0x3d4/0x9c0
[ 124.866730][ C2] worker_thread+0x5b/0x5d0
[ 124.874165][ C2] ? process_one_work+0x9c0/0x9c0
[ 124.879966][ C2] kthread+0x135/0x170
[ 124.884870][ C2] ? kthread_complete_and_exit+0x30/0x30
[ 124.890628][ C2] ret_from_fork+0x1f/0x30
[ 124.896103][ C2] </TASK>
[ 143.975320][ C2] Showing all locks held in the system:
[ 143.985599][ C2] 5 locks held by kworker/2:2/980:
[ 143.991149][ C2] #0: ffff88800d615938 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x304/0x9c0
[ 144.002742][ C2] #1: ffffc90003ed3e70 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x304/0x9c0
[ 144.015128][ C2] #2: ffff88800fe66a20 (&dev->mutex){....}-{3:3}, at: hub_event+0xc4/0x26b0
[ 144.022646][ C2] #3: ffff88810e445a20 (&dev->mutex){....}-{3:3}, at: __device_attach+0x41/0x290
[ 144.032770][ C2] #4: ffff88810e4449a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x41/0x290
[ 144.045877][ C2] 1 lock held by in:imklog/3844:
[ 144.052399][ C2] #0: ffff888106694ee8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x8d/0xa0
[ 144.060131][ C2] 1 lock held by a.out/4425:
[ 144.066811][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.074587][ C2] 3 locks held by systemd-udevd/4439:
[ 144.081494][ C2] #0: ffff8880140bc688 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_read_iter+0x9d/0x2e0
[ 144.095133][ C2] #1: ffff88810ba50938 (kn->active#60){++++}-{0:0}, at: kernfs_fop_read_iter+0xa5/0x2e0
[ 144.109502][ C2] #2: ffff88810e445a20 (&dev->mutex){....}-{3:3}, at: read_descriptors+0x3c/0x1e0
[ 144.123733][ C2] 2 locks held by a.out/4455:
[ 144.131152][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.139727][ C2] #1: ffffffff8652d068 (system_transition_mutex){+.+.}-{3:3}, at: snapshot_open+0x40/0x160
[ 144.154965][ C2] 1 lock held by a.out/4456:
[ 144.162582][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.171374][ C2] 1 lock held by a.out/4457:
[ 144.179220][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.188231][ C2] 1 lock held by a.out/4458:
[ 144.196069][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.204946][ C2] 1 lock held by a.out/4459:
[ 144.214175][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.246256][ C2] Showing busy workqueues and worker pools:
[ 144.254448][ C2] workqueue usb_hub_wq: flags=0x4
[ 144.262458][ C2] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
[ 144.262554][ C2] in-flight: 980:hub_event hub_event
[ 144.262743][ C2] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=29s workers=4 idle: 28 3084 53
[ 144.290063][ T4455] Calling probe_count=2
----------------------------------------
Powered by blists - more mailing lists