lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 9 Oct 2019 15:22:41 +0200
From:   "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>
To:     Jan Kara <jack@...e.cz>
Cc:     Jens Axboe <axboe@...nel.dk>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>, Tejun Heo <tj@...nel.org>,
        AceLan Kao <acelan.kao@...onical.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-kernel@...r.kernel.org
Subject: Re: System hangs if NVMe/SSD is removed during suspend

On 10/7/2019 12:08 PM, Jan Kara wrote:
> On Fri 04-10-19 07:32:40, Jens Axboe wrote:
>> On 10/4/19 5:01 AM, Mika Westerberg wrote:
>>> On Fri, Oct 04, 2019 at 11:59:26AM +0200, Rafael J. Wysocki wrote:
>>>> On Friday, October 4, 2019 10:03:40 AM CEST Mika Westerberg wrote:
>>>>> On Thu, Oct 03, 2019 at 09:50:33AM -0700, Tejun Heo wrote:
>>>>>> Hello, Mika.
>>>>>>
>>>>>> On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote:
>>>>>>> but from that discussion I don't see more generic solution to be
>>>>>>> implemented.
>>>>>>>
>>>>>>> Any ideas we should fix this properly?
>>>>>> Yeah, the only fix I can think of is not using freezable wq.  It's
>>>>>> just not a good idea and not all that difficult to avoid using.
>>>>> OK, thanks.
>>>>>
>>>>> In that case I will just make a patch that removes WQ_FREEZABLE from
>>>>> bdi_wq and see what people think about it :)
>>>> I guess that depends on why WQ_FREEZABLE was added to it in the first place. :-)
>>>>
>>>> The reason might be to avoid writes to persistent storage after creating an
>>>> image during hibernation, since wqs remain frozen throughout the entire
>>>> hibernation including the image saving phase.
>>> Good point.
>>>
>>>> Arguably, making the wq freezable is kind of a sledgehammer approach to that
>>>> particular issue, but in principle it may prevent data corruption from
>>>> occurring, so be careful there.
>>> I tried to find the commit that introduced the "freezing" and I think it
>>> is this one:
>>>
>>>     03ba3782e8dc writeback: switch to per-bdi threads for flushing data
>>>
>>> Unfortunately from that commit it is not clear (at least to me) why it
>>> calls set_freezable() for the bdi task. It does not look like it has
>>> anything to do with blocking writes to storage while entering
>>> hibernation but I may be mistaken.
>> Wow, a decade ago...
>>
>> Honestly, I don't recall why these were marked freezable, and as I wrote
>> in the other reply, I don't think there's a good reason for that to be
>> the case.
> Well, cannot it happen that the flush worker will get stuck in D state
> because some subsystem is already suspended and thus hibernation fails
> (because AFAIK processes in uninterruptible sleep block hibernation)?
>
> I was also somewhat worried that the hibernation image could be
> inconsistent if flush workers do something while hibernation image is being
> taken but that does not seem to be a valid concern as all kernel processes
> get frozen before hibernation image is taken.

To be precise, nothing is scheduled while creating a hibernation image, 
but once the image has been created, threads that are not frozen can be 
scheduled again and there are kernel threads which aren't frozen.

So the question is whether or not any of the kernel threads which are 
not frozen can do anything potentially unsafe if the bdi wq is not 
freezable and I don't quite see what that might be.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ