lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Jun 2018 10:22:43 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Chris Boot <bootc@....tc>, linux-kernel@...r.kernel.org,
        linux-block@...r.kernel.org
Subject: Re: Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()

On 6/12/18 10:19 AM, Chris Boot wrote:
> On 12/06/18 17:09, Jens Axboe wrote:
>> On 6/12/18 9:38 AM, Chris Boot wrote:
>>> Hi folks,
>>>
>>> I maintain a large (to me) system with 112 threads (4x Intel E7-4830 v4)
>>> which has a MegaRAID SAS 9361-24i controller. This system is currently
>>> running Debian's 4.16.12 kernel (from stretch-backports) with blk_mq
>>> enabled.
>>>
>>> I've run into a lockup which appears to involve blq_mq and writeback
>>> throttling. It's hard to tell if I've run into this same thing with
>>> older kernels; I'm trying to track down a deadlock but so far I've been
>>> fairly certain that involved the OOM killer, but this doesn't seem to.
> [snip]
>>
>> Hmm that's really weird, I don't see how we could be spinning on the
>> waitqueue lock like that. I haven't seen any wbt bug reports like this
>> before.
>>
>> Are things generally stable if you just turn off wbt? You can do that
>> for sda, for instance, by doing:
>>
>> # echo 0 > /sys/block/sda/queue/wbt_lat_usec
>>
>> It'd be interesting to get this data point. Eg leave blk-mq enabled, and
>> then just disable wbt.
> 
> Hi Jens,
> 
> Thanks for the speedy response. I'll see if I can get that tested soon;
> if the system is stable without blk_mq I can see the users wanting to
> keep it that way for a while. I'll let you know.

Understandable. I just get suspicious of the general state of the system,
if it's locking up there. Could be a hardware issue, or a bug in some
other area that's messing things up. I have wbt running on literally
hundreds of thousands of boxes and haven't seen a lockup like this.

>> Is anything disabling wbt in the system otherwise?
> 
> Not that I'm aware of, no.

OK, just wanted to rule out something related to the shutdown path
racing with IO.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ