lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 15 Oct 2021 00:43:45 -0700
From:   Norbert <nbrtt01@...il.com>
To:     linux-kernel@...r.kernel.org
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Performance regression: thread wakeup time (latency) increased up to
 3x

Performance regression: thread wakeup time (latency) increased up to 3x.

Happened between 5.13.8 and 5.14.0. Still happening at least on 5.14.11.

Short story:
------------
On isolated  CPUs, wakeup increased from 1080 ns to 3550 ns. (3.3x)
On non-isol. CPUs, wakeup increased from  980 ns to 1245 ns. (1.3x)

Such an increase is surely not an expected part of an intentional
change, especially considering that threads on isolated CPUs are often
latency sensitive. Also, for example, it significantly increases
throughput on contended locks in general (1.3x).

Long Story:
-----------
Time measured from before futex-wake on thread A, to after futex-wait 
returns on thread B.

Times are similar for eventfd write -> blocked-read, just a bit higher.

Thread A and B have affinity set on two neighboring CPUs on Threadripper 
Zen2 CPU at fixed frequency 4.0 Ghz. On isolated CPUs, with SCHED_FIFO, 
on non-isolated CPUs with SCHED_OTHER, however that does not make a big 
difference (I also measured the other combinations).

Measured 5.13.0, 5.13.8, 5.14.0, 5.14.9 and 5.14.11.
Some on Fedora 35 Beta, some on ClearLinux 35100.
All given times are measured with multi-user.target (no GUI shell). 
Times on graphical.target (with GUI shell) are about 10% higher.

These values are not an average of separate shorter and longer times:
This is a typical distribution:
(None are less than 3300 ns, and none are more than 5099 ns.)
  count with 33nn ns: 858
  count with 34nn ns: 19359
  count with 35nn ns: 57257
  count with 36nn ns: 6135
  count with 37nn ns: 150
  count with 38nn ns: 48
  count with 39nn ns: 11
  count with 40nn ns: 10
  count with 41nn ns: 10
  count with 42nn ns: 10
  count with 43nn ns: 7
  count with 44nn ns: 11
  count with 45nn ns: 3
  count with 46nn ns: 6
  count with 47nn ns: 3
  count with 48nn ns: 4
  count with 49nn ns: 1
  count with 50nn ns: 3

Also the times for the futex-wake call itself increased significantly:

On isolated  CPUs, wake call increased from 510 ns to 710 ns. (1.4x)
On non-isol, CPUs, wake call increased from 420 ns to 580 ns. (1.4x)

This is my first time reporting a kernel problem, so please excuse if 
this is not the right place or form. (Also I don't yet have the know-how
to bisect arbitrary kernel versions, or to compile specific patches.)

Powered by blists - more mailing lists