linux-kernel - Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZJKSyC29PfQcQsAr@ziqianlu-kbl>
Date:   Wed, 21 Jun 2023 14:03:52 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     David Vernet <void@...ifault.com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        <linux-kernel@...r.kernel.org>, <mingo@...hat.com>,
        <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
        <rostedt@...dmis.org>, <dietmar.eggemann@....com>,
        <bsegall@...gle.com>, <mgorman@...e.de>, <bristot@...hat.com>,
        <vschneid@...hat.com>, <joshdon@...gle.com>,
        <roman.gushchin@...ux.dev>, <tj@...nel.org>, <kernel-team@...a.com>
Subject: Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS

On Wed, Jun 21, 2023 at 12:43:52AM -0500, David Vernet wrote:
> On Wed, Jun 21, 2023 at 12:54:16PM +0800, Aaron Lu wrote:
> > On Tue, Jun 20, 2023 at 09:43:00PM -0500, David Vernet wrote:
> > > On Wed, Jun 21, 2023 at 10:35:34AM +0800, Aaron Lu wrote:
> > > > On Tue, Jun 20, 2023 at 12:36:26PM -0500, David Vernet wrote:
> > > > > On Fri, Jun 16, 2023 at 08:53:38AM +0800, Aaron Lu wrote:
> > > > > > I also tried that on the 18cores/36threads/LLC Skylake and the contention
> > > > > > is indeed much smaller than UDP_RR:
> > > > > > 
> > > > > >      7.30%     7.29%  [kernel.vmlinux]      [k]      native_queued_spin_lock_slowpath
> > > > > > 
> > > > > > But I wouldn't say it's entirely gone. Also consider Skylake has a lot
> > > > > > fewer cores per LLC than later Intel servers like Icelake and Sapphire
> > > > > > Rapids and I expect things would be worse on those two machines.
> > > > > 
> > > > > I cannot reproduce this contention locally, even on a slightly larger
> > > > 
> > > > With netperf client number equal to nr_cpu?
> > > 
> > > No, that confusion was only the first time around. See below though, I'm
> > > not sure what insights are to be gained by continuing to tinker with
> > > netperf runs.
> > > 
> > > > > Skylake. Not really sure what to make of the difference here. Perhaps
> > > > > it's because you're running with CONFIG_SCHED_CORE=y? What is the
> > > > 
> > > > Yes I had that config on but I didn't tag any tasks or groups.
> > > > 
> > > > > change in throughput when you run the default workload on your SKL?
> > > > 
> > > > The throughput dropped a little with SWQUEUE:
> > > > 
> > > >                  avg_throughput    native_queued_spin_lock_slowpath%
> > > > NO_SWQUEUE:      9528.061111111108      0.09%
> > > > SWQUEUE:         8984.369722222222      8.05%
> > > > 
> > > > avg_throughput: average throughput of all netperf client's throughput,
> > > > higher is better.
> > > > 
> > > > I run this workload like this:
> > > > "
> > > > netserver
> > > > 
> > > > for i in `seq 72`; do
> > > >         netperf -l 60 -n 72 -6 &
> > > > done
> > > > 
> > > > sleep 30
> > > > perf record -ag -e cycles:pp -- sleep 5 &
> > > > 
> > > > wait
> > > > "
> > > > (the '-n 72' should be redundant but I just keep it there)
> > > 
> > > At this point I'd say we've spent quite a bit of time discussing netperf
> > > results. We understand where the contention is coming from, and yes,
> > > we've established that there are going to be some configurations where
> > > swqueue is not well suited. We've also established that there are
> > > configurations where it will and does perform well, including on
> > > netperf.
> > > 
> > > I'm not sure what we're hoping to gain by continuing to run various
> > > netperf workloads with your specific parameters?
> > 
> > I don't quite follow you.
> > 
> > I thought we were in the process of figuring out why for the same
> > workload(netperf/default_mode/nr_client=nr_cpu) on two similar
> > machines(both are Skylake) you saw no contention while I saw some so I
> > tried to be exact on how I run the workload.
> 
> I just reran the workload on a 26 core / 52 thread Cooper Lake using
> your exact command below and still don't observe any contention
> whatsoever on the swqueue lock:

Well, it's a puzzle to me.

But as you said below, I guess I'll just move on.

Thanks,
Aaron

> 
> for i in `seq 72`; do
> 	netperf -l 60 -n 72 -6 &
> done
> 
> > If that's not the case, then yes there is no much value continuing this
> > discussion.
> 
> We can iterate until we find out why we're seeing slightly different
> contention (different configs, different amount of RAM, maybe you have
> turbo enabled or other things running on your host, etc), but I don't
> see what that would tell us that would meaningfully drive the discussion
> forward for the patch set. Is there anything in particular you're trying
> to determine and/or do you have reason to think that the contention
> you're observing is due to something other than a lot of tasks waking up
> at the same time, just as it was with UDP_RR?
> 
> Thanks,
> David