lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 29 Jun 2021 09:20:27 +0800 From: Ming Lei <ming.lei@...hat.com> To: wenxiong@...ibm.com Cc: Daniel Wagner <dwagner@...e.de>, linux-kernel@...r.kernel.org, james.smart@...adcom.com, wenxiong@...ibm.com, sagi@...mberg.me Subject: Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port Hi Wenxiong, On Mon, Jun 28, 2021 at 01:17:34PM -0500, wenxiong wrote: > > > > > The root cause is that blk-mq doesn't work well on tag allocation from > > specified hctx(blk_mq_alloc_request_hctx), and blk-mq assumes that any > > request allocation can't cross hctx inactive/offline, see > > blk_mq_hctx_notify_offline() > > Hi Ming, > > I tried to pass online cpu_id(like cpu=8 in my case) to > blk_mq_alloc_request_hctx(), > data.hctx = q->queue_hw_ctx[hctx_idx]; > but looks like data.hctx returned with NULL. So system crashed if accessing > data.hctx later. > > blk-mq request allocation can't cross hctx inactive/offline but blk-mq still > reallocate the hctx for offline cpus(like cpu=4,5,6,7 in my case) in > blk_mq_realloc_hw_ctxs() and hctx are NULL for online(cpu=8 in my case)cpus. > > Below is my understanding for hctxs, please correct me if I am wrong: > > Assume a system has two cores with 16 cpus: > Before doing cpu hot plug events: > cpu0-cpu7(core 0) : hctx->state is ACTIVE and q->hctx is not NULL. > cpu8-cpu15(core 1): hctx->state is ACTIVE and q->hctx is not NULL > > After doing cpu hot plug events(the second half of each core are offline) > cpu0-cpu3: online, hctx->state is ACTIVE and q->hctx is not NULL. > cpu4-cpu7: offline,hctx->state is INACTIVE and q->hctx is not NULL > cpu8-cpu11: online, hctx->state is ACTIVE but q->hctx = NULL > cpu12-cpu15:offline, hctx->state is INACTIVE and q->hctx = NULL > > So num_online_cpus() is 8 after cpu hotplug events. Either way not working > for me, no matter I pass 8 online cpus or 4 online/4 offline cpus. > > Is this correct? If nvmf pass online cpu ids to blk-mq, why it still > crashes/fails? NVMe users have to pass correct hctx_idx to blk_mq_alloc_request_hctx(), but from the info you provided, they don't provide valid hctx_idx to blk-mq, so q->queue_hw_ctx[hctx_idx] is NULL and kernel panic. I believe Daniel's following patch may fix this specific issue if your controller is FC: [1] https://lore.kernel.org/linux-nvme/YNXTaUMAFCA84jfZ@T590/T/#t Thanks, Ming
Powered by blists - more mailing lists