lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241015132928.GA3961@lst.de>
Date: Tue, 15 Oct 2024 15:29:28 +0200
From: Christoph Hellwig <hch@....de>
To: Tero Kristo <tero.kristo@...ux.intel.com>
Cc: Christoph Hellwig <hch@....de>, linux-kernel@...r.kernel.org,
	axboe@...nel.dk, linux-nvme@...ts.infradead.org, sagi@...mberg.me,
	kbusch@...nel.org
Subject: Re: [PATCH 1/1] nvme-pci: Add CPU latency pm-qos handling

On Tue, Oct 15, 2024 at 12:25:37PM +0300, Tero Kristo wrote:
> I've been giving this some thought offline, but can't really think of
> how this could be done in the generic layers; the code needs to figure
> out the interrupt that gets fired by the activity, to prevent the CPU
> that is going to handle that interrupt to go into deep idle,
> potentially ruining the latency and throughput of the request. The
> knowledge of this interrupt mapping only resides in the driver level,
> in this case NVMe.
> 
> One thing that could be done is to prevent the whole feature to be used
> on setups where the number of cpus per irq is above some threshold;
> lets say 4 as an example.

As a disclaimer I don't really understand the PM QOS framework, just
the NVMe driver and block layer.

With that my gut feeling is that all this latency management should
be driven by the blk_mq_hctx structure, the block layer equivalent
to a queue.  And instead of having a per-cpu array of QOS requests
per device, there should one per cpu in the actual mask of the
hctx, so that you only have to iterate this local shared data
structure.

Preferably there would be one single active check per hctx and
not one per cpu, e.g. when the block layer submits commands
it has to do one single check instead of an iteration.  Similarly
the block layer code would time out the activity once per hctx,
and only then iterate the (usually few) CPUs per hctx.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ