linux-kernel - Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6sn76aya225pqikijue5uv5h3lyqk262hc6ru3vemn7xofdftd@sw7gith52xh7>
Date: Wed, 26 Mar 2025 22:11:23 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev, 
	lkp@...el.com, linux-kernel@...r.kernel.org, x86@...nel.org, 
	Eric Dumazet <edumazet@...gle.com>, Benjamin Segall <bsegall@...gle.com>, 
	Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [tip:timers/core] [posix]  1535cb8028:
 stress-ng.epoll.ops_per_sec 36.2% regression

On Wed, Mar 26, 2025 at 09:07:51AM +0100, Thomas Gleixner wrote:
> On Mon, Mar 24 2025 at 14:39, kernel test robot wrote:
> > kernel test robot noticed a 36.2% regression of stress-ng.epoll.ops_per_sec on:
> >
> > commit: 1535cb80286e6fbc834f075039f85274538543c7 ("posix-timers: Improve hash table performance")
> > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git timers/core
> >
[snip]
> > | testcase: change | stress-ng: stress-ng.epoll.ops_per_sec 124.9% improvement                       |
> 
> How on earth can this commit result in both a 36% regression and a 25%
> improvement with the same test?
> 
> Unfortunately I can't reproduce any of it. I checked the epoll test
> source and it uses a posix timer, but that commit makes the hash less
> contended so there is zero explanation.
> 

The short summary is:
1. your change is fine
2. stress-ng is doing seriously weird stuff here resulting in the above
3. there may or may not be something the scheduler can do to help

for the regression stats are saying:
feb864ee99a2d8a2 1535cb80286e6fbc834f075039f
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
      5.97 ± 56%     +35.8       41.74 ± 24%  mpstat.cpu.all.idle%
      0.86 ±  3%      -0.3        0.51 ± 11%  mpstat.cpu.all.irq%
      0.10 ±  3%      +2.0        2.11 ± 13%  mpstat.cpu.all.soft%
     92.01 ±  3%     -37.7       54.27 ± 18%  mpstat.cpu.all.sys%
      1.06 ±  3%      +0.3        1.37 ±  8%  mpstat.cpu.all.usr%
     27.83 ± 38%     -84.4%       4.33 ± 31%  mpstat.max_utilization.seconds

As in system time went down and idle went up.

Your patch must have a side effect where it messes with some of the
timings between workers.

However, there is a possibility the scheduler may be doing something
better here -- the testcase spawned as is has wildly unstable
performance, literally orders of magnitude difference between runs and
tons of idle and it stabilizes if I use a taskset.

In an attempt to narrow it down I tried with few workers:
taskset --cpu-list 1,2 stress-ng --timeout 10 --times --verify --metrics --no-rand-seed --epoll 1

--epoll 1 spawns two worker threads and both are bound to only execute
on cores 1 and 2.

With this I consistently see high CPU usage and total executed ops
hanging around 190k.  Sample time output:
1.31s user 18.67s system 199% cpu 10.02s (10.023) total

If I whack the taskset or extend it to 1,2,3,4:
taskset --cpu-list 1,2,3,4 stress-ng --timeout 10 --times --verify --metrics --no-rand-seed --epoll 1

... I'm back to non-nensical perf, all the way down to 18k ops/s on the
lower end and over 200k on the higher one.

Sample time outputs in consecutive runs:
0.02s user 0.38s system 3% cpu 10.06s (10.060) total
0.34s user 4.59s system 48% cpu 10.13s (10.132) total

As in during the first run this spent almost the entire time off CPU.
During the second one it only used about a quarter of CPU time it could.

The testcase is doing a lot of weird stuff, including calling yield()
for every loop iteration. On top of that if the other worker does not
win the race there is also a sleep of 0.1s thrown in. I commented these
suckers out and weird anomalies persisted.

All that said, I'm not going to further look into it. Was curious wtf
though hence the write up.