linux-kernel - Re: [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0g6+CR80=zQi=aTbJDc3FAMRAFUE=e-QURcwi_25zgAfw@mail.gmail.com>
Date:   Mon, 11 Jul 2022 20:13:18 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Greg KH <gregkh@...uxfoundation.org>,
        Oliver Neukum <oneukum@...e.com>,
        Wedson Almeida Filho <wedsonaf@...gle.com>,
        "Rafael J. Wysocki" <rjw@...k.pl>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        Len Brown <len.brown@...el.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Linux PM <linux-pm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to
 timeout when resuming from hibernation

On Sun, Jul 10, 2022 at 4:25 AM Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
>
> syzbot is reporting hung task at misc_open() [1], for there is a race
> window of AB-BA deadlock which involves probe_count variable.
>
> Even with "char: misc: allow calling open() callback without misc_mtx
> held" and "PM: hibernate: call wait_for_device_probe() without
> system_transition_mutex held", wait_for_device_probe() from snapshot_open()
> can sleep forever if probe_count cannot become 0.
>
> Since snapshot_open() is a userland-driven hibernation/resume request,
> it should be acceptable to fail if something is wrong.

Not really.

If you are resuming from hibernation and the image cannot be reached
(which is the situation described above), failing and continuing to
boot means discarding the image and possible user data loss.

There is no "graceful failure" in this case.

> Users would not want to wait for hours if device stopped responding.

If the device holding the image is not responding, we should better
wait for it or panic().  Or let the user make the system reboot.