linux-kernel - Re: [PATCH 1/1] [RFC] blk-mq: fix queue stalling on shared hctx restart

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1508435243.2429.42.camel@wdc.com>
Date:   Thu, 19 Oct 2017 17:47:24 +0000
From:   Bart Van Assche <Bart.VanAssche@....com>
To:     "roman.penyaev@...fitbricks.com" <roman.penyaev@...fitbricks.com>
CC:     Bart Van Assche <Bart.VanAssche@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "hare@...e.com" <hare@...e.com>, "axboe@...com" <axboe@...com>,
        "hch@....de" <hch@....de>
Subject: Re: [PATCH 1/1] [RFC] blk-mq: fix queue stalling on shared hctx
 restart

On Wed, 2017-10-18 at 12:22 +0200, Roman Pen wrote:
> the patch below fixes queue stalling when shared hctx marked for restart
> (BLK_MQ_S_SCHED_RESTART bit) but q->shared_hctx_restart stays zero.  The
> root cause is that hctxs are shared between queues, but 'shared_hctx_restart'
> belongs to the particular queue, which in fact may not need to be restarted,
> thus we return from blk_mq_sched_restart() and leave shared hctx of another
> queue never restarted.
> 
> The fix is to make shared_hctx_restart counter belong not to the queue, but
> to tags, thereby counter will reflect real number of shared hctx needed to
> be restarted.

Hello Roman,

The patch you posted looks fine to me but seeing this patch and the patch
description makes me wonder why this had not been noticed before. Are you
perhaps using a block driver that returns BLK_STS_RESOURCE more often than
other block drivers? Did you perhaps run into this with the Infiniband
network block device (IBNBD) driver? No matter what driver triggered this,
I think this bug should be fixed.

Bart.