[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <70494547-a538-baf5-0554-6788ac2b45e8@linux.alibaba.com>
Date: Mon, 24 Jul 2023 14:57:57 +0800
From: luoben@...ux.alibaba.com
To: Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
pbonzini@...hat.com, mikelley@...rosoft.com, yu.c.chen@...el.com,
"Kenan.Liu" <Kenan.Liu@...ux.alibaba.com>, mingo@...hat.com,
juri.lelli@...hat.com, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, vschneid@...hat.com,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] Adjust CFS loadbalance to adapt QEMU CPU
topology.
On 2023/7/21 17:13, Peter Zijlstra <peterz@...radead.org> wrote:
> On Fri, Jul 21, 2023 at 10:33:44AM +0200, Vincent Guittot wrote:
> > On Fri, 21 Jul 2023 at 04:59, Kenan.Liu <Kenan.Liu@...ux.alibaba.com> wrote:
>
> >> The SMT topology in qemu native x86 CPU model is (0,1),…,(n,n+1),…,
> >> but nomarlly seen SMT topo in physical machine is like (0,n),(1,n+1),…,
> >> n means the total core number of the machine.
> >>
> >> The imbalance happens when the number of runnable threads is less
> >> than the number of hyperthreads, select_idle_core() would be called
> >> to decide which cpu be selected to run the waken-up task.
> >>
> >> select_idle_core() will return the checked cpu number if the whole
> >> core is idle. On the contrary, if any one HT of the core is busy,
> >> select_idle_core() would clear the whole core out from cpumask and
> >> check the next core.
> >>
> >> select_idle_core():
> >> …
> >> if (idle)
> >> return core;
> >>
> >> cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
> >> return -1;
> >>
> >> In this manner, except the very beginning of for_each_cpu_wrap() loop,
> >> HT with even ID number is always be checked at first, and be returned
> >> to the caller if the whole core is idle, so the odd indexed HT almost
> >> has no chance to be selected.
> >>
> >> select_idle_cpu():
> >> …
> >> for_each_cpu_wrap(cpu, cpus, target + 1) {
> >> if (has_idle_core) {
> >> i = select_idle_core(p, cpu, cpus, &idle_cpu);
> >>
> >> And this will NOT happen when the SMT topo is (0,n),(1,n+1),…, because
> >> when the loop starts from the bottom half of SMT number, HTs with larger
> >> number will be checked first, when it starts from the top half, their
> >> siblings with smaller number take the first place of inner core searching.
> >
> > But why is it a problem ? Your system is almost idle and 1 HT per core
> > is used. Who cares to select evenly one HT or the other as long as we
> > select an idle core in priority ?
>
> Right, why is this a problem? Hyperthreads are supposed to be symmetric,
> it doesn't matter which of the two are active, the important thing is to
> only have one active if we can.
>
> (Unlike Power7, they have asymmetric SMT)
>
hi Peter and Vincent,
Some upper-level monitoring logic may take the CPU usage as a metric for
computing resource scaling. Imbalanced scheduling can create the illusion
of CPU resource scarcity, leading to more frequent triggering of resource
expansion by the upper-level scheduling system. However, this is actually
a waste of resources. So, we think this may be a problem.
Could you please take a further look at PATCH#2? We found that the default
'nr' value did not perform well under our scenario, and we believe that
adjustable variables would be more appropriate.
Our scenario is as follows:
16 processes are running in a 32 CPU VM, with 8 threads per process,
they are all running the same job.
The expected result is that the CPU usage is evenly distributed, but
we found that even-numbered cores were used for scheduling decisions
and consumed more CPU resources (5%~20%), mainly because of the default
value of nr=4. In this scenario, we found that nr=2 is more suitable.
Thanks,
Ben
Powered by blists - more mailing lists