linux-kernel - Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87cye3zcvv.ffs@tglx>
Date: Wed, 26 Mar 2025 22:43:32 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
 lkp@...el.com, linux-kernel@...r.kernel.org, x86@...nel.org, Eric Dumazet
 <edumazet@...gle.com>, Benjamin Segall <bsegall@...gle.com>, Frederic
 Weisbecker <frederic@...nel.org>
Subject: Re: [tip:timers/core] [posix]  1535cb8028:
 stress-ng.epoll.ops_per_sec 36.2% regression

On Wed, Mar 26 2025 at 22:11, Mateusz Guzik wrote:
> On Wed, Mar 26, 2025 at 09:07:51AM +0100, Thomas Gleixner wrote:
>> How on earth can this commit result in both a 36% regression and a 25%
>> improvement with the same test?
>> 
>> Unfortunately I can't reproduce any of it. I checked the epoll test
>> source and it uses a posix timer, but that commit makes the hash less
>> contended so there is zero explanation.
>> 
>
> The short summary is:
> 1. your change is fine
> 2. stress-ng is doing seriously weird stuff here resulting in the above
> 3. there may or may not be something the scheduler can do to help
>
> for the regression stats are saying:
> feb864ee99a2d8a2 1535cb80286e6fbc834f075039f
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>       5.97 ± 56%     +35.8       41.74 ± 24%  mpstat.cpu.all.idle%
>       0.86 ±  3%      -0.3        0.51 ± 11%  mpstat.cpu.all.irq%
>       0.10 ±  3%      +2.0        2.11 ± 13%  mpstat.cpu.all.soft%
>      92.01 ±  3%     -37.7       54.27 ± 18%  mpstat.cpu.all.sys%
>       1.06 ±  3%      +0.3        1.37 ±  8%  mpstat.cpu.all.usr%
>      27.83 ± 38%     -84.4%       4.33 ± 31%  mpstat.max_utilization.seconds
>
> As in system time went down and idle went up.
>
> Your patch must have a side effect where it messes with some of the
> timings between workers.

It does as it removes the global lock and the potential contention on
it.

> The testcase is doing a lot of weird stuff, including calling yield()
> for every loop iteration. On top of that if the other worker does not
> win the race there is also a sleep of 0.1s thrown in. I commented these
> suckers out and weird anomalies persisted.
>
> All that said, I'm not going to further look into it. Was curious wtf
> though hence the write up.

Thak you for taking the time and looking into this. The analysis of this
"benchmark" is a fun read and I agree that it matches my impression of
looking into the source of this thing that it does weird stuff, which
does not make any sense at all.

Thanks,

        tglx