lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cc582ddb-2f16-4c0b-be27-b9a1dedb646a@linux.ibm.com>
Date: Tue, 22 Jul 2025 01:07:09 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>, clm@...a.com
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        vschneid@...hat.com
Subject: Re: [PATCH v2 00/12] sched: Address schbench regression



On 7/9/25 00:32, Peter Zijlstra wrote:
> On Mon, Jul 07, 2025 at 11:49:17PM +0530, Shrikanth Hegde wrote:
> 
>> Git bisect points to
>> # first bad commit: [dc968ba0544889883d0912360dd72d90f674c140] sched: Add ttwu_queue support for delayed tasks
> 
> Moo.. Are IPIs particularly expensive on your platform?
> 
>
It seems like the cost of IPIs is likely hurting here.

IPI latency really depends on whether CPU was busy, shallow idle state or deep idle state.
When it is in deep idle state numbers show close to 5-8us on average on this small system.
When system is busy, (could be doing another schbench thread) is around 1-2us.

Measured the time it took for taking the remote rq lock in baseline, that is around 1-1.5us only.
Also, here LLC is small core.(SMT4 core). So quite often the series would choose to send IPI.


Did one more experiment, pin worker and message thread such that it always sends IPI.

NO_TTWU_QUEUE_DELAYED

./schbench -L -m 4 -M auto -t 64 -n 0 -r 5 -i 5
average rps: 1549224.72
./schbench -L -m 4 -M 0-3 -W 4-39 -t 64 -n 0 -r 5 -i 5
average rps: 1560839.00

TTWU_QUEUE_DELAYED

./schbench -L -m 4 -M auto -t 64 -n 0 -r 5 -i 5             << IPI could be sent quite often ***
average rps: 959522.31
./schbench -L -m 4 -M 0-3 -W 4-39 -t 64 -n 0 -r 5 -i 5      << IPI are always sent. (M,W) don't share cache.
average rps: 470865.00                                      << rps goes even lower


=================================

*** issues/observations in schbench.

Chris,

When one does -W auto or -M auto i think code is meant to run, n message threads on first n CPUs and worker threads
on remaining CPUs?
I don't see that happening.  above behavior can be achieved only with -M <cpus> -W <cpus>

         int i = 0;
         CPU_ZERO(m_cpus);
         for (int i = 0; i < m_threads; ++i) {
                 CPU_SET(i, m_cpus);
                 CPU_CLR(i, w_cpus);
         }
         for (; i < CPU_SETSIZE; i++) {             << here i refers to the one in scope. which is 0. Hence w_cpus is set for all cpus.
                                                       And hence workers end up running on all CPUs even with -W auto
                 CPU_SET(i, w_cpus);
         }


Another issue, is that if CPU0 if offline, then auto pinning fails. Maybe no one cares about that case?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ