linux-kernel - Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180117095744.GF9487@ming.t460p>
Date:   Wed, 17 Jan 2018 17:57:45 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     "jianchao.wang" <jianchao.w.wang@...cle.com>
Cc:     linux-block@...r.kernel.org, Keith Busch <keith.busch@...el.com>,
        Sagi Grimberg <sagi@...mberg.me>,
        Christoph Hellwig <hch@...radead.org>,
        Stefan Haberland <sth@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
        James Smart <james.smart@...adcom.com>,
        Jens Axboe <axboe@...com>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each
 possisble CPU

Hi Jianchao,

On Wed, Jan 17, 2018 at 04:09:11PM +0800, jianchao.wang wrote:
> Hi ming 
> 
> Thanks for your kindly response.
> 
> On 01/17/2018 02:22 PM, Ming Lei wrote:
> > This warning can't be removed completely, for example, the CPU figured
> > in blk_mq_hctx_next_cpu(hctx) can be put on again just after the
> > following call returns and before __blk_mq_run_hw_queue() is scheduled
> > to run.
> > 
> > 	kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, msecs_to_jiffies(msecs))
> We could use cpu_active in __blk_mq_run_hw_queue() to narrow the window.
> There is a big gap between cpu_online and cpu_active. rebind_workers is also between them.

This warning is harmless, also you can't reproduce it without help of your
special patch, I guess, :-) So the window shouldn't be a big deal. 

But it can be a problem about the delay(msecs_to_jiffies(msecs)) passed to
kblockd_mod_delayed_work_on(), because during the period:

1) hctx->next_cpu can become online from offline before __blk_mq_run_hw_queue
is run, your warning is triggered, but it is harmless

2) hctx->next_cpu can become offline from online before __blk_mq_run_hw_queue
is run, there isn't warning, but once the IO is submitted to hardware,
after it is completed, how does the HBA/hw queue notify CPU since CPUs
assigned to this hw queue(irq vector) are offline? blk-mq's timeout
handler may cover that, but looks too tricky.

> 
> > 
> > Just be curious how you trigger this issue? And is it triggered in CPU
> > hotplug stress test? Or in a normal use case?
> 
> In fact, this is my own investigation about whether the .queue_rq to one hardware queue could be executed on
> the cpu where it is not mapped. Finally, found this hole when cpu hotplug.
> I did the test on NVMe device which has 1-to-1 mapping between cpu and hctx.
>  - A special patch that could hold some requests on ctx->rq_list though .get_budget
>  - A script issues IOs with fio
>  - A script online/offline the cpus continuously

Thanks for sharing your reproduction approach.

Without a handler for CPU hotplug, it isn't easy to avoid the warning
completely in __blk_mq_run_hw_queue().

> At first, just the warning above. Then after this patch was introduced, panic came up.

We have to fix the panic, so I will post the patch you tested in this thread.

Thanks,
Ming