linux-kernel - Re: [RFC PATCH 0/2] Adjust CFS loadbalance to adapt QEMU CPU topology.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <70494547-a538-baf5-0554-6788ac2b45e8@linux.alibaba.com>
Date:   Mon, 24 Jul 2023 14:57:57 +0800
From:   luoben@...ux.alibaba.com
To:     Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        pbonzini@...hat.com, mikelley@...rosoft.com, yu.c.chen@...el.com,
        "Kenan.Liu" <Kenan.Liu@...ux.alibaba.com>, mingo@...hat.com,
        juri.lelli@...hat.com, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] Adjust CFS loadbalance to adapt QEMU CPU
 topology.


On 2023/7/21 17:13, Peter Zijlstra <peterz@...radead.org> wrote:
> On Fri, Jul 21, 2023 at 10:33:44AM +0200, Vincent Guittot wrote:
> > On Fri, 21 Jul 2023 at 04:59, Kenan.Liu <Kenan.Liu@...ux.alibaba.com> wrote:
> 
> >> The SMT topology in qemu native x86 CPU model is (0,1),…,(n,n+1),…,
> >> but nomarlly seen SMT topo in physical machine is like (0,n),(1,n+1),…,
> >> n means the total core number of the machine.
> >>
> >> The imbalance happens when the number of runnable threads is less
> >> than the number of hyperthreads, select_idle_core() would be called
> >> to decide which cpu be selected to run the waken-up task.
> >>
> >> select_idle_core() will return the checked cpu number if the whole
> >> core is idle. On the contrary, if any one HT of the core is busy,
> >> select_idle_core() would clear the whole core out from cpumask and
> >> check the next core.
> >>
> >> select_idle_core():
> >>       …
> >>       if (idle)
> >>           return core;
> >>
> >>       cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
> >>       return -1;
> >>
> >> In this manner, except the very beginning of for_each_cpu_wrap() loop,
> >> HT with even ID number is always be checked at first, and be returned
> >> to the caller if the whole core is idle, so the odd indexed HT almost
> >> has no chance to be selected.
> >>
> >> select_idle_cpu():
> >>       …
> >>       for_each_cpu_wrap(cpu, cpus, target + 1) {
> >>           if (has_idle_core) {
> >>               i = select_idle_core(p, cpu, cpus, &idle_cpu);
> >>
> >> And this will NOT happen when the SMT topo is (0,n),(1,n+1),…, because
> >> when the loop starts from the bottom half of SMT number, HTs with larger
> >> number will be checked first, when it starts from the top half, their
> >> siblings with smaller number take the first place of inner core searching.
> >
> > But why is it a problem ? Your system is almost idle and 1 HT per core
> > is used. Who cares to select evenly one HT or the other as long as we
> > select an idle core in priority ?
> 
> Right, why is this a problem? Hyperthreads are supposed to be symmetric,
> it doesn't matter which of the two are active, the important thing is to
> only have one active if we can.
> 
> (Unlike Power7, they have asymmetric SMT)
> 

hi Peter and Vincent,

Some upper-level monitoring logic may take the CPU usage as a metric for
computing resource scaling. Imbalanced scheduling can create the illusion
of CPU resource scarcity, leading to more frequent triggering of resource
expansion by the upper-level scheduling system. However, this is actually
a waste of resources. So, we think this may be a problem.

Could you please take a further look at PATCH#2? We found that the default
'nr' value did not perform well under our scenario, and we believe that
adjustable variables would be more appropriate.

Our scenario is as follows:
16 processes are running in a 32 CPU VM, with 8 threads per process,
they are all running the same job. 

The expected result is that the CPU usage is evenly distributed, but
we found that even-numbered cores were used for scheduling decisions
and consumed more CPU resources (5%~20%), mainly because of the default
value of nr=4. In this scenario, we found that nr=2 is more suitable.

Thanks,
Ben