[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <977e9c62-c7f2-d1df-7d6b-5903f3b21cb6@oracle.com>
Date: Wed, 17 Jan 2018 16:09:11 +0800
From: "jianchao.wang" <jianchao.w.wang@...cle.com>
To: Ming Lei <ming.lei@...hat.com>
Cc: linux-block@...r.kernel.org, Keith Busch <keith.busch@...el.com>,
Sagi Grimberg <sagi@...mberg.me>,
Christoph Hellwig <hch@...radead.org>,
Stefan Haberland <sth@...ux.vnet.ibm.com>,
linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
James Smart <james.smart@...adcom.com>,
Jens Axboe <axboe@...com>,
Christian Borntraeger <borntraeger@...ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Christoph Hellwig <hch@....de>
Subject: Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each
possisble CPU
Hi ming
Thanks for your kindly response.
On 01/17/2018 02:22 PM, Ming Lei wrote:
> This warning can't be removed completely, for example, the CPU figured
> in blk_mq_hctx_next_cpu(hctx) can be put on again just after the
> following call returns and before __blk_mq_run_hw_queue() is scheduled
> to run.
>
> kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, msecs_to_jiffies(msecs))
We could use cpu_active in __blk_mq_run_hw_queue() to narrow the window.
There is a big gap between cpu_online and cpu_active. rebind_workers is also between them.
>
> Just be curious how you trigger this issue? And is it triggered in CPU
> hotplug stress test? Or in a normal use case?
In fact, this is my own investigation about whether the .queue_rq to one hardware queue could be executed on
the cpu where it is not mapped. Finally, found this hole when cpu hotplug.
I did the test on NVMe device which has 1-to-1 mapping between cpu and hctx.
- A special patch that could hold some requests on ctx->rq_list though .get_budget
- A script issues IOs with fio
- A script online/offline the cpus continuously
At first, just the warning above. Then after this patch was introduced, panic came up.
Thanks
Jianchao
Powered by blists - more mailing lists