lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 25 May 2019 07:58:49 +0800
From:   Dongli Zhang <dongli.zhang@...cle.com>
To:     Jiri Kosina <jikos@...nel.org>
Cc:     Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
        Sagi Grimberg <sagi@...mberg.me>, linux-kernel@...r.kernel.org,
        linux-nvme@...ts.infradead.org,
        Keith Busch <keith.busch@...el.com>,
        Hannes Reinecke <hare@...e.de>, Christoph Hellwig <hch@....de>
Subject: Re: [5.2-rc1 regression]: nvme vs. hibernation

Hi Jiri,

Looks this has been discussed in the past.

http://lists.infradead.org/pipermail/linux-nvme/2019-April/023234.html

I created a fix for a case but not good enough.

http://lists.infradead.org/pipermail/linux-nvme/2019-April/023277.html

Perhaps people would have better solution.

Dongli Zhang

On 05/25/2019 06:27 AM, Jiri Kosina wrote:
> On Fri, 24 May 2019, Keith Busch wrote:
> 
>>> Something is broken in Linus' tree (4dde821e429) with respec to 
>>> hibernation on my thinkpad x270, and it seems to be nvme related.
>>>
>>> I reliably see the warning below during hibernation, and then sometimes 
>>> resume sort of works but the machine misbehaves here and there (seems like 
>>> lost IRQs), sometimes it never comes back from the hibernated state.
>>>
>>> I will not have too much have time to look into this over weekend, so I am 
>>> sending this out as-is in case anyone has immediate idea. Otherwise I'll 
>>> bisect it on monday (I don't even know at the moment what exactly was the 
>>> last version that worked reliably, I'll have to figure that out as well 
>>> later).
>>
>> I believe the warning call trace was introduced when we converted nvme to
>> lock-less completions. On device shutdown, we'll check queues for any
>> pending completions, and we temporarily disable the interrupts to make
>> sure that queues interrupt handler can't run concurrently.
> 
> Yeah, the completion changes were the primary reason why I brought this up 
> with all of you guys in CC.
> 
>> On hibernation, most CPUs are offline, and the interrupt re-enabling
>> is hitting this warning that says the IRQ is not associated with any
>> online CPUs.
>>
>> I'm sure we can find a way to fix this warning, but I'm not sure that
>> explains the rest of the symptoms you're describing though.
> 
> It seems to be more or less reliable enough for bisect. I'll try that on 
> monday and will let you know.
> 
> Thanks,
> 

Powered by blists - more mailing lists