[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3513b14c-14e0-b865-628e-a83521090de9@huawei.com>
Date: Tue, 25 Oct 2022 08:40:38 +0100
From: John Garry <john.garry@...wei.com>
To: Ming Lei <ming.lei@...hat.com>
CC: <axboe@...nel.dk>, <linux-kernel@...r.kernel.org>,
<linux-block@...r.kernel.org>, <hch@....de>,
Bart Van Assche <bvanassche@....org>
Subject: Re: [PATCH] blk-mq: Properly init bios from
blk_mq_alloc_request_hctx()
On 25/10/2022 01:34, Ming Lei wrote:
>>>> but sometimes we just need to allocate for a specific HW
>>>> queue...
>>>>
>>>> For my usecase of interest, it should not impact if the cpumask of the HW
>>>> queue goes offline after selecting the cpu in blk_mq_alloc_request_hctx(),
>>>> so any race is ok ... I think.
>>>>
>>>> However it should be still possible to make blk_mq_alloc_request_hctx() more
>>>> robust. How about using something like work_on_cpu_safe() to allocate and
>>>> execute the request with blk_mq_alloc_request() on a cpu associated with the
>>>> HW queue, such that we know the cpu is online and stays online until we
>>>> execute it? Or also extent to work_on_cpumask_safe() variant, so that we
>>>> don't need to try all cpus in the mask (to see if online)?
>>> But all cpus on this hctx->cpumask could become offline.
>> If all hctx->cpumask are offline then we should not allocate a request and
>> this is acceptable. Maybe I am missing your point.
> As you saw, this API has the above problem too, but any one of CPUs
> may become online later, maybe just during blk_mq_alloc_request_hctx(),
> and it is easy to cause inconsistence.
>
> You didn't share your use case, but for nvme connection request, if it
> is 1:1 mapping, if any one of CPU becomes offline, the controller
> initialization could be failed, that isn't good from user viewpoint at
> all.
My use case is in SCSI EH domain. For my HBA controller of interest, to
abort an erroneous IO we must send a controller proprietary abort
command on same HW queue as original command. So we would need to
allocate this abort request for a specific HW queue.
I mentioned before that if no hctx->cpumask is online then we don't need
to allocate a request. That is because if no hctx->cpumask is online,
this means that original erroneous IO must be completed due to nature of
how blk-mq cpu hotplug handler works, i.e. drained, and then we don't
actually need to abort it any longer, so ok to not get a request.
I have an RFC series for this work in which I am using
blk_mq_alloc_request_hctx(). However, as I mentioned before, I can
experiment with using something like work_on_cpu_safe() to alloc and
execute the abort request to safeguard against cpu hotplug events.
Thanks,
John
Powered by blists - more mailing lists