linux-kernel - Re: [PATCH] char: misc: make misc_open() and misc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9e45b464-c680-7d22-c81d-9640059ef913@suse.com>
Date:   Wed, 6 Jul 2022 14:17:38 +0200
From:   Oliver Neukum <oneukum@...e.com>
To:     Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        Greg KH <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     Len Brown <len.brown@...el.com>, Pavel Machek <pavel@....cz>,
        Arnd Bergmann <arnd@...db.de>, linux-kernel@...r.kernel.org,
        linux-pm@...r.kernel.org,
        Wedson Almeida Filho <wedsonaf@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Arjan van de Ven <arjan@...ux.intel.com>
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 06.07.22 12:26, Tetsuo Handa wrote:

> wait_for_device_probe() in snapshot_open() was added by commit c751085943362143
> ("PM/Hibernate: Wait for SCSI devices scan to complete during resume"), and
> that commit did not take into account possibility of unresponsive hardware.
> 
>    "In addition, if the resume from hibernation is userland-driven, it's
>     better to wait for all device probes in the kernel to complete before
>     attempting to open the resume device."
> 
> 

Testsuo-san,

I am afraid my first reply was too court to be useful. Sorry for that.
First let me congratulate you for finding and analyzing an important
issue.
Yet, I am afraid while your analysis is good, your attempt at a fix
suffers from being too close to the analysis, instead of taking a step
back and looking at root causes.
Frankly I was afraid you'd look at UAS next and try to fix it in the
same way. And that is the core of the issue. IF the SCSI layer can be
made to hang a host controller by an unresponsive device, the issue
is in the SCSI layer. If you were to insist on your current approach
you'd have to go through every host controller driver. You are just
seeing this only with storage because you are fuzzing USB, not SCSI.
But the bug you found is more fundamental than a single bus system.

The SCSI layer is just designed in such a way that timeouts are handled
by the core. That is a fundamental design decision you cannot easily
deviate from. Hence I would like to ask you to take a closer look
at the scanning code in the SCSI layer, not a host controller driver.

	Regards
		Oliver