[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ee9bf8df-844b-4e68-b941-c66f5406375c@flourine.local>
Date: Thu, 23 Jan 2025 13:54:54 +0100
From: Daniel Wagner <dwagner@...e.de>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Christoph Hellwig <hch@....de>, LKML <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>, Daniel Wagner <wagi@...nel.org>, Hannes Reinecke <hare@...e.de>,
Ming Lei <ming.lei@...hat.com>, John Garry <john.g.garry@...cle.com>,
Jens Axboe <axboe@...nel.dk>
Subject: Re: WARNING: CPU: 3 PID: 1 at block/blk-mq-cpumap.c:90
blk_mq_map_hw_queues+0xf3/0x100
On Thu, Jan 23, 2025 at 08:59:57AM +0100, Daniel Wagner wrote:
> On Wed, Jan 22, 2025 at 05:58:17PM -0500, Steven Rostedt wrote:
> > On Wed, 22 Jan 2025 12:54:45 -0500
> > Steven Rostedt <rostedt@...dmis.org> wrote:
> >
> > > Not sure its related. I can see how reproducible this is, and if it is, I
> > > can try to bisect it.
> >
> > I bisected it down to: a5665c3d150c98 ("virtio: blk/scsi: replace
> > blk_mq_virtio_map_queues with blk_mq_map_hw_queues")
> >
> > And reverting that as well as:
> >
> > 9bc1e897a821f ("blk-mq: remove unused queue mapping helpers")
> >
> > It booted fine.
>
> In the previous tests you just comment out the WARN_ON_ONCE or did you
> also replace blk_mq_clear_mq_map with blk_mq_map_queues? The
> blk_mq_clear_mq_map will map all queues to CPU 0 and if you offline CPU
> 0, there is nothing left to serve the hctx. I'll try to reproduce your
> test and see if I my idea works.
I've reproduced the second crash as well using your good old
stress-cpu-hotplug script. blk_mq_clear_mq_map the CPUs are mapped to
the first hctx and when offline a CPU blk_mq_hctx_notify_offline is not
happy about not finding any CPU mapped to the hctx.
The patch below should fix your problem. I've tested the different
setups and all looked good to me.
>From ad6e1bd1705e127c191caf2d3becc6ce989e8d96 Mon Sep 17 00:00:00 2001
From: Daniel Wagner <wagi@...nel.org>
Date: Thu, 23 Jan 2025 12:59:58 +0100
Subject: [PATCH] blk-mq: create correct map for fallback case
The fallback code in blk_mq_map_hw_queues is original from
blk_mq_pci_map_queues and was added to handle the case where
pci_irq_get_affinity will return NULL for !SMP configuration.
blk_mq_map_hw_queues replaces besides blk_mq_pci_map_queues also
blk_mq_virtio_map_queues which used to use blk_mq_map_queues for the
fallback.
It's possible to use blk_mq_map_queues for both cases though.
blk_mq_map_queues creates the same map as blk_mq_clear_mq_map for !SMP
that is CPU 0 will be mapped to hctx 0.
The WARN_ON_ONCE has to be dropped for virtio as the fallback is also
taken for certain configuration on default. Though there is still a
WARN_ON_ONCE check in lib/group_cpus.c:
WARN_ON(nr_present + nr_others < numgrps);
which will trigger if the caller tries to create more hardware queues
than CPUs. It tests the same as the WARN_ON_ONCE in
blk_mq_pci_map_queues did.
Reported-by: Steven Rostedt <rostedt@...dmis.org>
Signed-off-by: Daniel Wagner <wagi@...nel.org>
---
block/blk-mq-cpumap.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index ad8d6a363f24..444798c5374f 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -87,7 +87,6 @@ void blk_mq_map_hw_queues(struct blk_mq_queue_map *qmap,
return;
fallback:
- WARN_ON_ONCE(qmap->nr_queues > 1);
- blk_mq_clear_mq_map(qmap);
+ blk_mq_map_queues(qmap);
}
EXPORT_SYMBOL_GPL(blk_mq_map_hw_queues);
--
2.48.1
Powered by blists - more mailing lists