[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <mEiqrmv9bhO2K6CATsFiUM9mMKO22lOoCxD6l8knZlI2J_qqWeTsHfNvNHW136tO0kY2fdPvGHTNL3Mu58mzS1rahajdiyvP6d2btDPLJxc=@ashe.io>
Date: Wed, 18 Jun 2025 15:53:12 +0000
From: "Sean A." <sean@...e.io>
To: John Garry <john.g.garry@...cle.com>
Cc: "James.Bottomley@...senpartnership.com" <James.Bottomley@...senPartnership.com>, "atomlin@...mlin.com" <atomlin@...mlin.com>, "kashyap.desai@...adcom.com" <kashyap.desai@...adcom.com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>, "martin.petersen@...cle.com" <martin.petersen@...cle.com>, "mpi3mr-linuxdrv.pdl@...adcom.com" <mpi3mr-linuxdrv.pdl@...adcom.com>, "sreekanth.reddy@...adcom.com" <sreekanth.reddy@...adcom.com>, "sumit.saxena@...adcom.com" <sumit.saxena@...adcom.com>
Subject: Re: [RFC PATCH v2 1/1] scsi: mpi3mr: Introduce smp_affinity_enable module parameter
Thank you, we'll certainly look into rq_affinity. We do isolate with managed_irq, so I did not expect to see these spanning the isolated core set.
Every other driver we use honors isolation+managed_irq, or exposes tunables (as proposed in parent) to afford some control over these behaviors for people like us. I realize we are in the minority here; there is tangible impact to our sort of business from an increase in interrupt rates on critical cores across a population of machines at scale. It would be good to know if this was a conscious decision by the maintainers to prioritize their controller's performance or a simple omission so that we can decide whether to continue pursuing this vs researching other [vendor] options.
SA
On Wednesday, June 18th, 2025 at 2:49 AM, John Garry <john.g.garry@...cle.com> wrote:
>
>
> On 17/06/2025 17:34, Sean A. wrote:
>
> > Le 17 Jun 2025, John Garry a écrit :
> >
> > > You have given no substantial motivation for this change
> >
> > From my perspective, workloads exist (defense, telecom, finance, RT etc) that prefer not to be interrupted and developers may opt to utilize CPU isolation and other mechanisms to reduce the likelihood of being pre-empted, evicted, etc. This includes steering interrupts away from an isolated set of cores. Also while this doesn't result from any actual benchmarking, it would seem that forcing your way on to every core in a 192 core system and refusing to move might be needlessly greedy or even detrimental to performance if most of the core set is NUMA-foreign to the storage controller. One should be able to make placement decisions to protect app threads from interruption and to ensure the interrupt handler has a sleepy, local core to play with without lighting up a bunch of interconnect paths on the way.
> >
> > Generically, I believe interfaces like /proc/$pid/smp_affinity[_list] should be allowed to work as expected, and things like irqbalance should also be able to do their jobs unless there's a good (documented) reason they should not.
>
>
> There is a good reason. Some of these storage controllers have hundreds
> of MSI-Xs - typically one per CPU. If you offline CPUs, those interrupts
> need to be migrated to target other CPUs. And for architectures like
> x86, CPUs can only handle a finite and relatively modest amount of
> interrupts (being targeted). That is why managed interrupts are used
> (which this module parameter would disable for this controller).
>
> BTW, if you use taskset to set the affinity of a process and ensure that
> /sys/block/xxx/queue/rq_affinity is set so that we complete on same CPU
> as submitted, then I thought that this would ensure that interrupts are
> not bothering other CPUs.
Powered by blists - more mailing lists