linux-kernel - Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <YNmdhqd+W3XbJCwd@T590>
Date:   Mon, 28 Jun 2021 17:59:34 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Daniel Wagner <dwagner@...e.de>
Cc:     wenxiong@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
        james.smart@...adcom.com, wenxiong@...ibm.com, sagi@...mberg.me
Subject: Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing
 port

On Mon, Jun 28, 2021 at 11:07:03AM +0200, Daniel Wagner wrote:
> Hi Wen,
> 
> On Sun, Jun 27, 2021 at 10:14:32PM -0500, wenxiong@...ux.vnet.ibm.com wrote:
> > @@ -468,8 +467,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >  	data.hctx = q->queue_hw_ctx[hctx_idx];
> >  	if (!blk_mq_hw_queue_mapped(data.hctx))
> >  		goto out_queue_exit;
> > -	cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
> > -	data.ctx = __blk_mq_get_ctx(q, cpu);
> > +	data.ctx = __blk_mq_get_ctx(q, hctx_idx);
> 
> hctx_idx is just an index, not a CPU id. In this scenario, the hctx_idx
> used to lookup the context happens to be valid. I am still a bit
> confused why [1] doesn't work for this scenario.

[1] is fine from blk-mq viewpoint, but nvme needs to improve the
failure handling, otherwise no io queues may be connected in the
worst case.

> 
> As Ming pointed out in [2] we need to update cpumask for CPU hotplug

I mention there is still hole with your patch, not mean we need to
update cpumask.

The root cause is that blk-mq doesn't work well on tag allocation from
specified hctx(blk_mq_alloc_request_hctx), and blk-mq assumes that any
request allocation can't cross hctx inactive/offline, see blk_mq_hctx_notify_offline()
and blk_mq_get_tag(). Either the allocated request is completed or new
allocation is prevented before the current hctx becomes inactive(any CPU in
hctx->cpumask is offline).

I tried[1] to move connecting io queue into driver and kill blk_mq_alloc_request_hctx()
for addressing this issue, but there is corner case(timeout) not covered.

I understand that NVMe's requirement is that connect io queue should be
done successfully no matter if the hctx is inactive or not. Sagi,
connect me if I am wrong.

[1]
https://lore.kernel.org/linux-block/fda43a50-a484-dde7-84a1-94ccf9346bdd@broadcom.com/T/#m1e902f69e8503f5e6202945b8b79e5b7252e3689

Thanks,
Ming