linux-kernel - Re: [PATCHSET v2 0/2] Split iowait into two states

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fe642c2f-4a7e-451a-b918-277c22904a7a@kernel.dk>
Date: Tue, 27 Feb 2024 19:21:30 -0700
From: Jens Axboe <axboe@...nel.dk>
To: linux-kernel@...r.kernel.org
Cc: peterz@...radead.org, mingo@...hat.com
Subject: Re: [PATCHSET v2 0/2] Split iowait into two states

On 2/27/24 2:06 PM, Jens Axboe wrote:
> I haven't been able to properly benchmark patch 1, as the atomics are
> noise in any workloads that approximate normality. I can certainly
> concoct a synthetic test case if folks are interested. My gut says that
> we're trading 3 fast path atomics for none, and with the 4th case
> _probably_ being way less likely. There we grab the rq lock.

OK, so on Chris's suggestion, I tried his schbench to exercise the
scheduling side. It's very futex intensive, so I hacked up futex to set
iowait state when sleeping. I also added simple accounting to that path
so I knew how many times it ran. A run of:

/schbench -m 60 -t 10 -p 8

on a 2 socket Intel(R) Xeon(R) Platinum 8458P with 176 threads, there's
no regression in performance and try_to_wake_up() locking the rq of the
task being scheduled in from another CPU doesn't seem to register much.
On the previous run, I saw 2.21% there and now it's 2.36%. But it was
also a better performing run, which may have lead to the increase.

Each run takes 30 seconds, and during that time I see around 290-310M
hits of that path, or about ~10M/sec. Without modifying futex to use
iowait, we obviously rarely hit it. About 200 times for a run, which
makes sense as we're not really doing IO.

Anyway, just some data on this. If I leave the futex/pipe iowait in and
run the same test, I see no discernable difference in profiles. In fact,
the highest cost across the tests is bringing in the task->in_iowait
cacheline.

-- 
Jens Axboe