[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e2e62f99-164d-4e80-be1e-9affa724efd0@flourine.local>
Date: Fri, 4 Jul 2025 11:21:49 +0200
From: Daniel Wagner <dwagner@...e.de>
To: Hannes Reinecke <hare@...e.de>
Cc: Daniel Wagner <wagi@...nel.org>, Jens Axboe <axboe@...nel.dk>,
Keith Busch <kbusch@...nel.org>, Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
"Michael S. Tsirkin" <mst@...hat.com>, Aaron Tomlin <atomlin@...mlin.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>, Thomas Gleixner <tglx@...utronix.de>,
Costa Shulyupin <costa.shul@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Valentin Schneider <vschneid@...hat.com>, Waiman Long <llong@...hat.com>, Ming Lei <ming.lei@...hat.com>,
Frederic Weisbecker <frederic@...nel.org>, Mel Gorman <mgorman@...e.de>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
linux-nvme@...ts.infradead.org, megaraidlinux.pdl@...adcom.com, linux-scsi@...r.kernel.org,
storagedev@...rochip.com, virtualization@...ts.linux.dev,
GR-QLogic-Storage-Upstream@...vell.com
Subject: Re: [PATCH v7 08/10] blk-mq: use hk cpus only when isolcpus=io_queue
is enabled
On Thu, Jul 03, 2025 at 08:58:02AM +0200, Hannes Reinecke wrote:
> > + for (queue = 0; queue < qmap->nr_queues; queue++) {
> > + unsigned int idx = (qmap->queue_offset + queue) % nr_masks;
> > +
> > + for_each_cpu(cpu, &hk_masks[idx]) {
> > + qmap->mq_map[cpu] = idx;
> > +
> > + if (cpu_online(cpu))
> > + cpumask_set_cpu(qmap->mq_map[cpu], active_hctx);
>
> Why cpu_online? Up until this point it really didn't matter if the affinity
> mask was set to 'online' or 'possible' cpus, but here you
> require CPUs to be online...
This part here tracks if the a hardware context has at least one
housekeeping CPU online. It is possible to provide configuration where
we end up with hardware context which have offline housekeeping CPUs and
online isolcpus. active_hctx tracks which of the hardware contexts is
usable which is used in the next step...
> > + }
> > + }
> > +
> > + /* Map isolcpus to hardware context */
> > + queue = cpumask_first(active_hctx);
> > + for_each_cpu_andnot(cpu, cpu_possible_mask, mask) {
> > + qmap->mq_map[cpu] = (qmap->queue_offset + queue) % nr_masks;
> > + queue = cpumask_next_wrap(queue, active_hctx);
> > + }
>
> Really? Doesn't this map _all_ cpus, and not just the isolcpus?
for_each_cpu iterates over is all CPUs which are not houskeeping CPUs
(mask is the housekeeping mask), thus these are all isol CPU. Note the
'andnot' part.
The cpumask_first/cpumask_next_wrap returns only hardware context which
have at least one housekeeping CPU which is online. Yes, it possible to
make this a bit smarter, so that we keep the grouping of the offline
CPUs intact, though I am not sure if it is worth to add complexity for a
corner case at least not yet.
> > +fallback:
> > + /*
> > + * Map all CPUs to the first hctx to ensure at least one online
> > + * housekeeping CPU is serving it.
> > + */
> > + for_each_possible_cpu(cpu)
> > + qmap->mq_map[cpu] = 0;
>
> I think you need to map all hctx, no?
The block layer is filtering out hctx which have no CPU assigned to it
when selecting a queue. This is really a failsafe mode, it just makes
sure the system boots.
> > + /* Map housekeeping CPUs to a hctx */
> > + for (queue = 0; queue < qmap->nr_queues; queue++) {
> > + for_each_cpu(cpu, dev->bus->irq_get_affinity(dev, offset + queue)) {
> > + qmap->mq_map[cpu] = qmap->queue_offset + queue;
> > +
> > + cpumask_set_cpu(cpu, mask);
> > + if (cpu_online(cpu))
> > + cpumask_set_cpu(qmap->mq_map[cpu], active_hctx);
>
> Now that is really curious. You pick up the interrupt affinity from the
> 'bus', which, I assume, is the PCI bus. And this would imply that the
> bus can (or already is) programmed for this interrupt affinity.
Yes, this is the case. irq_create_affinity_masks which use
group_cpu_evenly/group_mask_cpu_evenly for the number of requested IRQs.
The number of IRQs can be higher than the number of requested queues
here. It's necessary to use the affinity mask created by
irq_create_affinity_mask as input.
> Which would imply that this is a usable interrupt affinity from the
> hardware perspective, irrespective on whether the cpu is online or
> not. So why the check to cpu_online()? Can't we simply take the existing affinity
> and rely on the hardware to do the right thing?
Again, this is tracking if a htcx has online housekeeping CPU.
Powered by blists - more mailing lists