linux-kernel - Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts handling requests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACVXFVOSiCTRXOT=hN1hJnseUrNy3t2q79dW6_qQuuRJ2KB4Yw@mail.gmail.com>
Date:	Tue, 30 Jun 2015 17:06:20 +0800
From:	Ming Lei <tom.leiming@...il.com>
To:	Akinobu Mita <akinobu.mita@...il.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts
 handling requests

On Mon, Jun 29, 2015 at 10:25 PM, Akinobu Mita <akinobu.mita@...il.com> wrote:
> 2015-06-28 1:08 GMT+09:00 Ming Lei <tom.leiming@...il.com>:
>> On Sat, Jun 27, 2015 at 1:14 AM, Akinobu Mita <akinobu.mita@...il.com> wrote:
>>> Akinobu Mita <akinobu.mita@...il.com> wrote:
>>>> 2015-06-26 0:40 GMT+09:00 Ming Lei <tom.leiming@...il.com>:
>>>> > On Thu, 25 Jun 2015 21:49:43 +0900
>>>> > Akinobu Mita <akinobu.mita@...il.com> wrote:
>>>> >> For example, there is a single hw queue (hctx) and two CPU queues
>>>> >> (ctx0 for CPU0, and ctx1 for CPU1).  Now CPU1 is just onlined and
>>>> >> a request is inserted into ctx1->rq_list and set bit0 in pending
>>>> >> bitmap as ctx1->index_hw is still zero.
>>>> >>
>>>> >> And then while running hw queue, flush_busy_ctxs() finds bit0 is set
>>>> >> in pending bitmap and tries to retrieve requests in
>>>> >> hctx->ctxs[0].rq_list.  But htx->ctxs[0] is ctx0, so the request in
>>>> >> ctx1->rq_list is ignored.
>>>> >
>>>> > Per current design, the request should have been inserted into ctx0 instead
>>>> > of ctx1 because ctx1 isn't mapped yet even though ctx1->cpu becomes ONLINE.
>>>> >
>>>> > So how about the following patch? which looks much simpler.
>>>>
>>>> OK, I'll try this patch to see if the problem disappears.
>>>
>>> This doesn't fix the problem.  Because:
>>>
>>>> > diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> > index f537796..2f45b73 100644
>>>> > --- a/block/blk-mq.c
>>>> > +++ b/block/blk-mq.c
>>>> > @@ -1034,7 +1034,12 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue,
>>>> >         struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx;
>>>> >
>>>> >         current_ctx = blk_mq_get_ctx(q);
>>>> > -       if (!cpu_online(ctx->cpu))
>>>> > +       /*
>>>> > +        * ctx->cpu may become ONLINE but ctx hasn't been mapped to
>>>> > +        * hctx yet because there is a tiny race window between
>>>> > +        * ctx->cpu ONLINE and doing the remap
>>>> > +        */
>>>> > +       if (!blk_mq_ctx_mapped(ctx))
>>>> >                 rq->mq_ctx = ctx = current_ctx;
>>>
>>> The process running on just onlined CPU1 in the above example can
>>> satisfy this condition and current_ctx will be ctx1.  So the same
>>> scenario can happen (the request is ignored by flush_busy_ctxs).
>>
>> Yeah, that is possible, and it should be bug of blk_mq_insert_request(),
>> because the function supposes that the current ctx is mapped.
>>
>> Then I think the approach in your 1st email of this thread may be
>> good one, and looks we have to make the remapping during
>> CPU UP_PREPARE notifier.
>
> OK, we can move on to other topic that you suggested in the other mail
> thread.

Why? That is just what this patch in this thread is doing, :-)

>
>>> I found simple alternative solution that assigns the offline CPUs
>>> unique ctx->index_hw.
>>
>> This solution looks simpler, but it may not be correct.
>
> You are right.  This solution needs amendments, just in case we needs
> to come back this solution instead of the first approach.
>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 594eea0..a8fcfbf 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -1787,10 +1787,11 @@ static void blk_mq_map_swqueue(struct
>>> request_queue *q)
>>>          */
>>>         queue_for_each_ctx(q, ctx, i) {
>>>                 /* If the cpu isn't online, the cpu is mapped to first hctx */
>>> -               if (!cpu_online(i))
>>> -                       continue;
>>> +               if (!cpu_online(i) && cpu_possible(i))
>>
>> The cpu_possible() check isn't needed.
>
> OK.
>
>>> +                       hctx = q->mq_ops->map_queue(q, 0);
>>> +               else
>>> +                       hctx = q->mq_ops->map_queue(q, i);
>>
>> The above change supposes that all offline CPUs(ctxs)
>> share the same 'hctx' mapped from CPU 0, and that is
>> obvious wrong.
>>
>> All offline CPUs should have shared the 1st 'hctx' instead
>> of the 'hctx' mapped from CPU 0.
>
> Do you mean that we shoud use 'q->queue_hw_ctx[0]' for offline CPU?

Yes, it is the 1st hctx, but I don't think it is good idea to add the unmapped
ctx into the 1st hctx because it may bring side effects.

>
>>>
>>> -               hctx = q->mq_ops->map_queue(q, i);
>>>                 cpumask_set_cpu(i, hctx->cpumask);
>>
>> CPU i shouldn't have been set on hctx->cpumask in this approach
>> if it isn't online.
>
> OK.
>
>>>                ctx->index_hw = hctx->nr_ctx;
>>>                hctx->ctxs[hctx->nr_ctx++] = ctx;
>>
>> I am not sure the current code is ready about adding offline 'ctx' into
>> 'hctx', and there are some observalbe effects at least:
>>
>> - blk_mq_run_hw_queue() can run even the hctx hasn't mapped
>> 'ctx'
>
> Is this fixed by not setting hctx->cpumask for offline CPUs?
>
>> - the offline ctx kobject will be exposed to user space in sysfs
>
> OK.  this should be avoided.
>
>> - blk_mq_hw_queue_mapped() may becomes always true
>> - set->tags[i](request entries) may not be freed even if there
>> aren't mapped 'ctx' in one 'hctx'
>
> Aren't these only happend for the 1st hctx (i.e. 'q->queue_hw_ctx[0]')?

I mean it may bring lots of issues if you add the unmapped ctx into
the 1st hctx by 'hctx->ctxs[hctx->nr_ctx++] = ctx', that may cause the
fix quite complicated.

So I think it will be good to do the remapping during CPU PREPRE_UP
notifier, and I will review this patch later.

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/