lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241004101014.3716006-1-tero.kristo@linux.intel.com>
Date: Fri,  4 Oct 2024 13:09:27 +0300
From: Tero Kristo <tero.kristo@...ux.intel.com>
To: 
Cc: linux-kernel@...r.kernel.org,
	axboe@...nel.dk,
	hch@....de,
	linux-nvme@...ts.infradead.org,
	sagi@...mberg.me,
	kbusch@...nel.org
Subject: [PATCH 0/1] nvme-pci: Add CPU latency pm-qos handling

Hello,

Re-posting this as the 6.12-rc1 is out, and the previous RFC didn't
receive any feedback. The patch hasn't seen any changes, but I included
the cover letter for details.

The patch adds mechanism for tacking NVME latency with random workloads.
A new sysfs knob (cpu_latency_us) is added under NVME devices, which can
be used to fine tune PM QoS CPU latency limit while NVME is operational.

Below is a postprocessed measurement run on an Icelake Xeon platform,
measuring latencies with 'fio' tool, running random-read and read
profiles. 5 random-read and 5 bulk read operations are done with the
latency limit enabled / disabled, and the maximum 'slat' (start latency),
'clat' (completion latency) and 'lat' (total latency) values shown for each
setup; values are in microseconds. The bandwidth is measured with the
'read' payload of fio, and min-avg-max values are shown in MiB/s. c6%
indicates the time spent in c6 state as percentage during the test for
the CPU running 'fio'.

==
Setting cpu_latency_us limit to 10 (enabled)
  slat: 31, clat: 99, lat: 113, bw: 1156-1332-1359, c6%: 2.8
  slat: 49, clat: 135, lat: 143, bw: 1156-1332-1361, c6%: 1.0
  slat: 67, clat: 148, lat: 156, bw: 1159-1331-1361, c6%: 0.9
  slat: 51, clat: 99, lat: 107, bw: 1160-1330-1356, c6%: 1.0
  slat: 82, clat: 114, lat: 122, bw: 1156-1333-1359, c6%: 1.0
Setting cpu_latency_us limit to -1 (disabled)
  slat: 112, clat: 275, lat: 364, bw: 1153-1334-1364, c6%: 80.0
  slat: 110, clat: 270, lat: 324, bw: 1164-1338-1369, c6%: 80.1
  slat: 106, clat: 260, lat: 320, bw: 1159-1330-1362, c6%: 79.7
  slat: 110, clat: 255, lat: 300, bw: 1156-1332-1363, c6%: 80.2
  slat: 107, clat: 248, lat: 322, bw: 1152-1331-1362, c6%: 79.9
==

As a summary, the c6 induced latencies are eliminated from the
random-read tests ('clat' drops from 250+us to 100-150us), and in the
maximum throughput testing the bandwidth is not impacted negatively
(bandwidth values are pretty much identical) so the overhead introduced
is minimal.

-Tero


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ