[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230330195157.afbqtusnnbnvtlyz@parnassus.localdomain>
Date: Thu, 30 Mar 2023 15:51:57 -0400
From: Daniel Jordan <daniel.m.jordan@...cle.com>
To: Aaron Lu <aaron.lu@...el.com>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Tim Chen <tim.c.chen@...el.com>,
Nitin Tekchandani <nitin.tekchandani@...el.com>,
Waiman Long <longman@...hat.com>,
Yu Chen <yu.c.chen@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Make tg->load_avg per node
On Thu, Mar 30, 2023 at 01:46:02PM -0400, Daniel Jordan wrote:
> Hi Aaron,
>
> On Wed, Mar 29, 2023 at 09:54:55PM +0800, Aaron Lu wrote:
> > On Wed, Mar 29, 2023 at 02:36:44PM +0200, Dietmar Eggemann wrote:
> > > On 28/03/2023 14:56, Aaron Lu wrote:
> > > > On Tue, Mar 28, 2023 at 02:09:39PM +0200, Dietmar Eggemann wrote:
> > > >> On 27/03/2023 07:39, Aaron Lu wrote:
> > And not sure if you did the profile on different nodes? I normally chose
> > 4 cpus of each node and do 'perf record -C' with them, to get an idea
> > of how different node behaves and also to reduce the record size.
> > Normally, when tg is allocated on node 0, then node 1's profile would
> > show higher cycles for update_cfs_group() and update_load_avg().
>
> Wouldn't the choice of CPUs have a big effect on the data, depending on
> where sysbench or postgres tasks run?
Oh, probably not with NCPU threads though, since the load would be
pretty even, so I think I see where you're coming from.
> > I guess your setup may have a much lower migration number?
>
> I also tried this and sure enough didn't see as many migrations on
> either of two systems. I used a container with your steps with a plain
> 6.2 kernel underneath, and the cpu controller is on (weight only). I
> increased connections and buffer size to suit each machine, and took
> Chen's suggestion to try without numa balancing.
>
> AMD EPYC 7J13 64-Core Processor
> 2 sockets * 64 cores * 2 threads = 256 CPUs
>
> sysbench: nr_threads=256
>
> All observability data was taken at one minute in and using one tool at
> a time.
>
> @migrations[1]: 1113
> @migrations[0]: 6152
> @wakeups[1]: 8871744
> @wakeups[0]: 9773321
>
> # profiled the whole system for 5 seconds, reported w/ --sort=dso,symbol
> 0.38% update_load_avg
> 0.13% update_cfs_group
>
> Using higher (nr_threads=380) and lower (nr_threads=128) load doesn't
> change these numbers much.
>
> The topology of my machine is different from yours, but it's the biggest
> I have, and I'm assuming cpu count is more important than topology when
> reproducing the remote accesses. I also tried on
>
> Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
> 2 sockets * 32 cores * 2 thread = 128 CPUs
>
> with nr_threads=128 and got similar results.
>
> I'm guessing you've left all sched knobs alone? Maybe sharing those and
> the kconfig would help close the gap. Migrations do increase to near
> what you were seeing when I disable SIS_UTIL (with SIS_PROP already off)
> on the Xeon, and I see 4-5% apiece for the functions you mention when
> profiling, but turning SIS_UTIL off is an odd thing to do.
Powered by blists - more mailing lists