linux-kernel - Re: [PATCH] char: misc: make misc_open() and misc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a6d98824-56c8-6d92-bb1b-eb065b57cb81@I-love.SAKURA.ne.jp>
Date:   Wed, 6 Jul 2022 19:26:28 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Greg KH <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     Len Brown <len.brown@...el.com>, Pavel Machek <pavel@....cz>,
        Arnd Bergmann <arnd@...db.de>, linux-kernel@...r.kernel.org,
        linux-pm@...r.kernel.org,
        Wedson Almeida Filho <wedsonaf@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Arjan van de Ven <arjan@...ux.intel.com>
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/06 15:34, Greg KH wrote:
> On Wed, Jul 06, 2022 at 03:21:15PM +0900, Tetsuo Handa wrote:
>> How should we fix this problem?
> 
> We can decrease the timeout in usb_stor_msg_common().  I imagine that if
> that timeout is ever hit in this sequence, then all will recover, right?
> Try decreasing it to a sane number and see what happens.

Yes, all recovers with below diff.

------------------------------------------------------------
diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index 1928b3918242..d2a192306e0c 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -164,7 +164,7 @@ static int usb_stor_msg_common(struct us_data *us, int timeout)
  
 	/* wait for the completion of the URB */
 	timeleft = wait_for_completion_interruptible_timeout(
-			&urb_done, timeout ? : MAX_SCHEDULE_TIMEOUT);
+			&urb_done, timeout ? : 5 * HZ);
  
 	clear_bit(US_FLIDX_URB_ACTIVE, &us->dflags);
 
------------------------------------------------------------

But

>> Anyway,
>>
>>         /*
>>          * Resuming.  We may need to wait for the image device to
>>          * appear.
>>          */
>>         wait_for_device_probe();
>>
>> in snapshot_open() will sleep forever if some device became unresponsive.
>>

wait_for_device_probe() in snapshot_open() was added by commit c751085943362143
("PM/Hibernate: Wait for SCSI devices scan to complete during resume"), and
that commit did not take into account possibility of unresponsive hardware.

   "In addition, if the resume from hibernation is userland-driven, it's
    better to wait for all device probes in the kernel to complete before
    attempting to open the resume device."

It is trivial to make e.g. atomic_read(&probe_count) == 10, which means that
acceptable timeout for usb_stor_msg_common() may be no longer acceptable timeout
for wait_for_device_probe(). Unlike flush_workqueue(), wait_for_device_probe()
can wait forever if new probe requests keep coming in while waiting for existing
probe requests to complete. Therefore, I think we should introduce timeout on
wait_for_device_probe() side as well.

I would like to propose below changes in 3 patches as fixes for this problem.
Since there are 13 wait_for_device_probe() callers, maybe we want both killable
and uninterruptible versions and pass timeout as an argument...

------------------------------------------------------------
 drivers/base/dd.c               |    3 ++-
 drivers/char/misc.c             |    9 ++++++---
 drivers/usb/storage/transport.c |    2 +-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 3fc3b5940bb3..67e08b381ee2 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -723,7 +723,8 @@ void wait_for_device_probe(void)
 	flush_work(&deferred_probe_work);
 
 	/* wait for the known devices to complete their probing */
-	wait_event(probe_waitqueue, atomic_read(&probe_count) == 0);
+	wait_event_killable_timeout(probe_waitqueue,
+				    atomic_read(&probe_count) == 0, 60 * HZ);
 	async_synchronize_full();
 }
 EXPORT_SYMBOL_GPL(wait_for_device_probe);
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index ca5141ed5ef3..6430c534a1cb 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -104,7 +104,8 @@ static int misc_open(struct inode *inode, struct file *file)
 	int err = -ENODEV;
 	const struct file_operations *new_fops = NULL;
 
-	mutex_lock(&misc_mtx);
+	if (mutex_lock_killable(&misc_mtx))
+		return -EINTR;
 
 	list_for_each_entry(c, &misc_list, list) {
 		if (c->minor == minor) {
@@ -116,7 +117,8 @@ static int misc_open(struct inode *inode, struct file *file)
 	if (!new_fops) {
 		mutex_unlock(&misc_mtx);
 		request_module("char-major-%d-%d", MISC_MAJOR, minor);
-		mutex_lock(&misc_mtx);
+		if (mutex_lock_killable(&misc_mtx))
+			return -EINTR;
 
 		list_for_each_entry(c, &misc_list, list) {
 			if (c->minor == minor) {
@@ -178,7 +180,8 @@ int misc_register(struct miscdevice *misc)
 
 	INIT_LIST_HEAD(&misc->list);
 
-	mutex_lock(&misc_mtx);
+	if (mutex_lock_killable(&misc_mtx))
+		return -EINTR;
 
 	if (is_dynamic) {
 		int i = find_first_zero_bit(misc_minors, DYNAMIC_MINORS);
diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index 1928b3918242..d2a192306e0c 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -164,7 +164,7 @@ static int usb_stor_msg_common(struct us_data *us, int timeout)
  
 	/* wait for the completion of the URB */
 	timeleft = wait_for_completion_interruptible_timeout(
-			&urb_done, timeout ? : MAX_SCHEDULE_TIMEOUT);
+			&urb_done, timeout ? : 60 * HZ);
  
 	clear_bit(US_FLIDX_URB_ACTIVE, &us->dflags);
 
------------------------------------------------------------