linux-kernel - Re: [RFC PATCH] sched/fair: Make tg->load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230330195157.afbqtusnnbnvtlyz@parnassus.localdomain>
Date:   Thu, 30 Mar 2023 15:51:57 -0400
From:   Daniel Jordan <daniel.m.jordan@...cle.com>
To:     Aaron Lu <aaron.lu@...el.com>
Cc:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Tim Chen <tim.c.chen@...el.com>,
        Nitin Tekchandani <nitin.tekchandani@...el.com>,
        Waiman Long <longman@...hat.com>,
        Yu Chen <yu.c.chen@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Make tg->load_avg per node

On Thu, Mar 30, 2023 at 01:46:02PM -0400, Daniel Jordan wrote:
> Hi Aaron,
> 
> On Wed, Mar 29, 2023 at 09:54:55PM +0800, Aaron Lu wrote:
> > On Wed, Mar 29, 2023 at 02:36:44PM +0200, Dietmar Eggemann wrote:
> > > On 28/03/2023 14:56, Aaron Lu wrote:
> > > > On Tue, Mar 28, 2023 at 02:09:39PM +0200, Dietmar Eggemann wrote:
> > > >> On 27/03/2023 07:39, Aaron Lu wrote:
> > And not sure if you did the profile on different nodes? I normally chose
> > 4 cpus of each node and do 'perf record -C' with them, to get an idea
> > of how different node behaves and also to reduce the record size.
> > Normally, when tg is allocated on node 0, then node 1's profile would
> > show higher cycles for update_cfs_group() and update_load_avg().
> 
> Wouldn't the choice of CPUs have a big effect on the data, depending on
> where sysbench or postgres tasks run?

Oh, probably not with NCPU threads though, since the load would be
pretty even, so I think I see where you're coming from.

> > I guess your setup may have a much lower migration number?
> 
> I also tried this and sure enough didn't see as many migrations on
> either of two systems.  I used a container with your steps with a plain
> 6.2 kernel underneath, and the cpu controller is on (weight only).  I
> increased connections and buffer size to suit each machine, and took
> Chen's suggestion to try without numa balancing.
> 
> AMD EPYC 7J13 64-Core Processor
>     2 sockets * 64 cores * 2 threads = 256 CPUs
> 
> sysbench: nr_threads=256
> 
> All observability data was taken at one minute in and using one tool at
> a time.
> 
>     @migrations[1]: 1113
>     @migrations[0]: 6152
>     @wakeups[1]: 8871744
>     @wakeups[0]: 9773321
> 
>     # profiled the whole system for 5 seconds, reported w/ --sort=dso,symbol
>     0.38%       update_load_avg
>     0.13%       update_cfs_group
> 
> Using higher (nr_threads=380) and lower (nr_threads=128) load doesn't
> change these numbers much.
> 
> The topology of my machine is different from yours, but it's the biggest
> I have, and I'm assuming cpu count is more important than topology when
> reproducing the remote accesses.  I also tried on
> 
> Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
>     2 sockets * 32 cores * 2 thread = 128 CPUs
> 
> with nr_threads=128 and got similar results.
> 
> I'm guessing you've left all sched knobs alone?  Maybe sharing those and
> the kconfig would help close the gap.  Migrations do increase to near
> what you were seeing when I disable SIS_UTIL (with SIS_PROP already off)
> on the Xeon, and I see 4-5% apiece for the functions you mention when
> profiling, but turning SIS_UTIL off is an odd thing to do.