linux-kernel - Re: [PATCH 1/1] nvme-pci: Add CPU latency pm-qos handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <accb9ceb501197b71259d8d3996c461dcef1e7d6.camel@linux.intel.com>
Date: Wed, 09 Oct 2024 11:24:45 +0300
From: Tero Kristo <tero.kristo@...ux.intel.com>
To: Christoph Hellwig <hch@....de>
Cc: linux-kernel@...r.kernel.org, axboe@...nel.dk, 
	linux-nvme@...ts.infradead.org, sagi@...mberg.me, kbusch@...nel.org
Subject: Re: [PATCH 1/1] nvme-pci: Add CPU latency pm-qos handling

On Wed, 2024-10-09 at 10:00 +0200, Christoph Hellwig wrote:
> On Wed, Oct 09, 2024 at 09:45:07AM +0300, Tero Kristo wrote:
> > Initially, I posted the patch against block layer, but there the
> > recommendation was to move this closer to the HW; i.e. NVMe driver
> > level.
> 
> Even if it is called from NVMe, at lot of the code is not nvme
> specific.
> Some of it appears block specific and other pats are entirely
> generic.
> 
> But I still don't see how walking cpumasks and updating paramters in
> far away (in terms of cache lines and pointer dereferences) for every
> single I/O could work without having a huge performance impact.
> 

Generally, the cpumask only has a couple of CPUs on it; yes its true on
certain setups every CPU of the system may end up on it, but then the
user has the option to not enable this feature at all. In my testing
system, there is a separate NVME irq for each CPU, so the affinity mask
only contains one bit.

Also, the code tries to avoid calling the heavy PM QoS stuff, by
checking if the request is already active, and updating the values in a
workqueue later on. Generally the heavy-ish parameter update only
happens on the first activity of a burst of NVMe accesses.

-Tero