[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c8419d9b-2b31-2190-3058-3625bdbcb13d@meta.com>
Date: Thu, 22 Jun 2023 11:57:48 -0400
From: Chris Mason <clm@...a.com>
To: Aaron Lu <aaron.lu@...el.com>, David Vernet <void@...ifault.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
rostedt@...dmis.org, dietmar.eggemann@....com, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
joshdon@...gle.com, roman.gushchin@...ux.dev, tj@...nel.org,
kernel-team@...a.com
Subject: Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS
On 6/21/23 2:03 AM, Aaron Lu wrote:
> On Wed, Jun 21, 2023 at 12:43:52AM -0500, David Vernet wrote:
>> On Wed, Jun 21, 2023 at 12:54:16PM +0800, Aaron Lu wrote:
>>> On Tue, Jun 20, 2023 at 09:43:00PM -0500, David Vernet wrote:
>>>> On Wed, Jun 21, 2023 at 10:35:34AM +0800, Aaron Lu wrote:
>>>>> On Tue, Jun 20, 2023 at 12:36:26PM -0500, David Vernet wrote:
[ ... ]
>>>> I'm not sure what we're hoping to gain by continuing to run various
>>>> netperf workloads with your specific parameters?
>>>
>>> I don't quite follow you.
>>>
>>> I thought we were in the process of figuring out why for the same
>>> workload(netperf/default_mode/nr_client=nr_cpu) on two similar
>>> machines(both are Skylake) you saw no contention while I saw some so I
>>> tried to be exact on how I run the workload.
>>
>> I just reran the workload on a 26 core / 52 thread Cooper Lake using
>> your exact command below and still don't observe any contention
>> whatsoever on the swqueue lock:
>
> Well, it's a puzzle to me.
>
> But as you said below, I guess I'll just move on.
Thanks for bringing this up Aaron. The discussion moved on to different
ways to fix the netperf triggered contention, but I wanted to toss this
out as an easy way to see the same problem:
# swqueue disabled:
# ./schbench -L -m 52 -p 512 -r 10 -t 1
Wakeup Latencies percentiles (usec) runtime 10 (s) (14674354 total samples)
20.0th: 8 (4508866 samples)
50.0th: 11 (2879648 samples)
90.0th: 35 (5865268 samples)
* 99.0th: 70 (1282166 samples)
99.9th: 110 (124040 samples)
min=1, max=9312
avg worker transfer: 28211.91 ops/sec 13.78MB/s
During the swqueue=0 run, the system was ~30% idle
# swqueue enabled:
# ./schbench -L -m 52 -p 512 -r 10 -t 1
Wakeup Latencies percentiles (usec) runtime 10 (s) (6448414 total samples)
20.0th: 30 (1383423 samples)
50.0th: 39 (1986827 samples)
90.0th: 63 (2446965 samples)
* 99.0th: 108 (567275 samples)
99.9th: 158 (57487 samples)
min=1, max=15018
avg worker transfer: 12395.27 ops/sec 6.05MB/s
During the swqueue=1 run, the CPU was at was 97% system time, all stuck
on spinlock contention in the scheduler.
This is a single socket cooperlake with 26 cores/52 threads.
The work is similar to perf pipe test, 52 messenger threads each bouncing
a message back and forth with their own private worker for a 10 second run.
Adding more messenger threads (-m 128) increases the swqueue=0 ops/sec
to about 19MB/s and drags down the swqueue=1 ops/sec to 2MB/s.
-chris
Powered by blists - more mailing lists