[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230405213117.jx2t5z3liowbr5su@parnassus.localdomain>
Date: Wed, 5 Apr 2023 17:31:17 -0400
From: Daniel Jordan <daniel.m.jordan@...cle.com>
To: Aaron Lu <aaron.lu@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Yu Chen <yu.c.chen@...el.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Tim Chen <tim.c.chen@...el.com>,
Nitin Tekchandani <nitin.tekchandani@...el.com>,
Waiman Long <longman@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Make tg->load_avg per node
On Tue, Apr 04, 2023 at 11:15:40PM +0800, Aaron Lu wrote:
> On Mon, Mar 27, 2023 at 01:39:55PM +0800, Aaron Lu wrote:
> [...]
> > Another observation of this workload is: it has a lot of wakeup time
> > task migrations and that is the reason why update_load_avg() and
> > update_cfs_group() shows noticeable cost. Running this workload in N
> > instances setup where N >= 2 with sysbench's nr_threads set to 1/N nr_cpu,
> > task migrations on wake up time are greatly reduced and the overhead from
> > the two above mentioned functions also dropped a lot. It's not clear to
> > me why running in multiple instances can reduce task migrations on
> > wakeup path yet.
>
> Regarding this observation, I've some finding. The TLDR is: 1 instance
> setup's overall CPU util is lower than N >= 2 instances setup and as a
> result, under 1 instance setup, sis() is more likely to find idle cpus
> than N >= 2 instances setup and that is the reason why 1 instance setup
> has more migrations.
>
> More details:
>
> For 1 instance with nr_thread=nr_cpu=224 setup, during a 5s window,
> there are 10 million calls of select_idle_sibling() and 6.1 million
> migrations. Of these migrations, 4.6 million comes from select_idle_cpu(),
> 1.3 million comes from recent_cpu.
> mpstat of this time window:
> Average: NODE %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> Average: all 45.15 0.00 18.59 0.00 0.00 17.29 0.00 0.00 0.00 18.98
> Average: 0 38.14 0.00 17.29 0.00 0.00 14.77 0.00 0.00 0.00 29.80
> Average: 1 52.07 0.00 19.88 0.00 0.00 19.78 0.00 0.00 0.00 8.28
Aha. It takes one instance of nr_thread=(3/4)*nr_cpu to get this
overall utilization on my aforementioned Xeon, but then I see 3-4% on
both functions in the profile. I'll poke at it some more, see how bad
it hurts over more loads, might take a bit though.
> For 4 instance with nr_thread=56 setup, during a 5s window, there are 15
> million calls of select_idle_sibling() and only 30k migrations.
> select_idle_cpu() is called 15 million times but only 23k of them passed
> the sd_share->nr_idle_scan != 0 test.
> mpstat of this time window:
> Average: NODE %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> Average: all 68.54 0.00 21.54 0.00 0.00 8.35 0.00 0.00 0.00 1.58
> Average: 0 70.05 0.00 20.92 0.00 0.00 8.17 0.00 0.00 0.00 0.87
> Average: 1 67.03 0.00 22.16 0.00 0.00 8.53 0.00 0.00 0.00 2.29
>
> For 8 instance with nr_thread=28 setup, during a 5s window, there are
> 16 million calls of select_idle_sibling() and 9.6k migrations.
> select_idle_cpu() is called 16 million times but none of them passed the
> sd_share->nr_idle_scan != 0 test.
> mpstat of this time window:
> Average: NODE %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> Average: all 70.29 0.00 20.99 0.00 0.00 8.28 0.00 0.00 0.00 0.43
> Average: 0 71.58 0.00 19.98 0.00 0.00 8.04 0.00 0.00 0.00 0.40
> Average: 1 69.00 0.00 22.01 0.00 0.00 8.52 0.00 0.00 0.00 0.47
>
> On a side note: when sd_share->nr_idle_scan > 0 and has_idle_core is true,
> then sd_share->nr_idle_scan is not actually respected. Is this intended?
> It seems to say: if there is idle core, then let's try hard and ignore
> SIS_UTIL to find that idle core, right?
Powered by blists - more mailing lists