linux-kernel - Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zj7C5fPCAdGwGsrI@fedora>
Date: Sat, 11 May 2024 08:59:17 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Keith Busch <kbusch@...nel.org>
Cc: Christoph Hellwig <hch@....de>, Keith Busch <kbusch@...a.com>,
	linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
	tglx@...utronix.de
Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Fri, May 10, 2024 at 06:41:58PM -0600, Keith Busch wrote:
> On Sat, May 11, 2024 at 07:50:21AM +0800, Ming Lei wrote:
> > On Fri, May 10, 2024 at 10:20:02AM -0600, Keith Busch wrote:
> > > On Fri, May 10, 2024 at 05:10:47PM +0200, Christoph Hellwig wrote:
> > > > On Fri, May 10, 2024 at 07:14:59AM -0700, Keith Busch wrote:
> > > > > From: Keith Busch <kbusch@...nel.org>
> > > > > 
> > > > > Some people _really_ want to control their interrupt affinity.
> > > > 
> > > > So let them argue why.  I'd rather have a really, really, really
> > > > good argument for this crap, and I'd like to hear it from the horses
> > > > mouth.
> > > 
> > > It's just prioritizing predictable user task scheduling for a subset of
> > > CPUs instead of having consistently better storage performance.
> > > 
> > > We already have "isolcpus=managed_irq," parameter to prevent managed
> > > interrupts from running on a subset of CPUs, so the use case is already
> > > kind of supported. The problem with that parameter is it is a no-op if
> > > the starting affinity spread contains only isolated CPUs.
> > 
> > Can you explain a bit why it is a no-op? If only isolated CPUs are
> > spread on one queue, there will be no IO originated from these isolated
> > CPUs, that is exactly what the isolation needs.
> 
> The "isolcpus=managed_irq," option doesn't limit the dispatching CPUs.

Please see commit a46c27026da1 ("blk-mq: don't schedule block kworker on isolated CPUs")
in for-6.10/block.

> It only limits where the managed irq will assign the effective_cpus as a
> best effort.

Most of times it does work.

> 
> Example, I boot with a system with 4 threads, one nvme device, and
> kernel parameter:
> 
>   isolcpus=managed_irq,2-3
> 
> Run this:
> 
>   for i in $(seq 0 3); do taskset -c $i dd if=/dev/nvme0n1 of=/dev/null bs=4k count=1000 iflag=direct; done

It is one test problem, when you try to isolate '2-3', it isn't expected
to submit IO or run application on these isolated CPUs.


Thanks, 
Ming