linux-kernel - Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts handling requests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAC5umyhPQMuZCF4DMfL1kwVBFfeAheW0tTtju=qcrU=yFhPofw@mail.gmail.com>
Date:	Thu, 25 Jun 2015 21:49:43 +0900
From:	Akinobu Mita <akinobu.mita@...il.com>
To:	Ming Lei <tom.leiming@...il.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts
 handling requests

2015-06-25 17:07 GMT+09:00 Ming Lei <tom.leiming@...il.com>:
> On Thu, Jun 25, 2015 at 10:56 AM, Akinobu Mita <akinobu.mita@...il.com> wrote:
>> 2015-06-25 1:24 GMT+09:00 Ming Lei <tom.leiming@...il.com>:
>>> On Wed, Jun 24, 2015 at 10:34 PM, Akinobu Mita <akinobu.mita@...il.com> wrote:
>>>> Hi Ming,
>>>>
>>>> 2015-06-24 18:46 GMT+09:00 Ming Lei <tom.leiming@...il.com>:
>>>>> On Sun, Jun 21, 2015 at 9:52 PM, Akinobu Mita <akinobu.mita@...il.com> wrote:
>>>>>> ctx->index_hw is zero for the CPUs which have never been onlined since
>>>>>> the block queue was initialized.  If one of those CPUs is hotadded and
>>>>>> starts handling request before new mappings are established, pending
>>>>>
>>>>> Could you explain a bit what the handling request is? The fact is that
>>>>> blk_mq_queue_reinit() is run after all queues are put into freezing.
>>>>
>>>> Notifier callbacks for CPU_ONLINE action can be run on the other CPU
>>>> than the CPU which was just onlined.  So it is possible for the
>>>> process running on the just onlined CPU to insert request and run
>>>> hw queue before blk_mq_queue_reinit_notify() is actually called with
>>>> action=CPU_ONLINE.
>>>
>>> You are right because blk_mq_queue_reinit_notify() is alwasy run after
>>> the CPU becomes UP, so there is a tiny window in which the CPU is up
>>> but the mapping is updated.  Per current design, the CPU just onlined
>>> is still mapped to hw queue 0 until the mapping is updated by
>>> blk_mq_queue_reinit_notify().
>>>
>>> But I am wondering why it is a problem and why you think flush_busy_ctxs
>>> can't find the requests on the software queue in this situation?
>>
>> The problem happens when the CPU has just been onlined first time
>> since the request queue was initialized.  At this time ctx->index_hw
>> for the CPU is still zero before blk_mq_queue_reinit_notify is called.
>>
>> The request can be inserted to ctx->rq_list, but blk_mq_hctx_mark_pending()
>> marks busy for wrong bit position as ctx->index_hw is zero.
>
> It isn't wrong bit since the CPU onlined just is still mapped to hctx 0 at that
> time .

ctx->index_hw is not CPU queue to HW queue mapping.
ctx->index_hw is the index in hctx->ctxs[] for this ctx.
Each ctx in a hw queue should have unique ctx->index_hw.

This problem can be reproducible with a single hw queue. (The script
in cover letter can reproduce this problem with a single hw queue)

>> flush_busy_ctxs() only retrieves the requests from software queues
>> which are marked busy.  So the request just inserted is ignored as
>> the corresponding bit position is not busy.
>
> Before making the remap in blk_mq_queue_reinit() for the CPU topo change,
> the request queue will be put into freezing first and all requests
> inserted to hctx 0
> should be retrieved and scheduled out. So can the request be igonred by
> flush_busy_ctxs()?

For example, there is a single hw queue (hctx) and two CPU queues
(ctx0 for CPU0, and ctx1 for CPU1).  Now CPU1 is just onlined and
a request is inserted into ctx1->rq_list and set bit0 in pending
bitmap as ctx1->index_hw is still zero.

And then while running hw queue, flush_busy_ctxs() finds bit0 is set
in pending bitmap and tries to retrieve requests in
hctx->ctxs[0].rq_list.  But htx->ctxs[0] is ctx0, so the request in
ctx1->rq_list is ignored.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/