lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f389a220-b628-575a-7af1-d897ee5730cc@huawei.com>
Date: Sat, 14 Sep 2024 15:03:46 +0800
From: zhengzucheng <zhengzucheng@...wei.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
CC: Waiman Long <longman@...hat.com>, <peterz@...radead.org>,
	<juri.lelli@...hat.com>, <dietmar.eggemann@....com>, <rostedt@...dmis.org>,
	<bsegall@...gle.com>, <mgorman@...e.de>, <vschneid@...hat.com>,
	<oleg@...hat.com>, Frederic Weisbecker <frederic@...nel.org>,
	<mingo@...nel.org>, <peterx@...hat.com>, <tj@...nel.org>,
	<tjcao980311@...il.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [Question] sched:the load is unbalanced in the VM overcommitment scenario


在 2024/9/13 23:55, Vincent Guittot 写道:
> On Fri, 13 Sept 2024 at 06:03, zhengzucheng <zhengzucheng@...wei.com> wrote:
>> In the VM overcommitment scenario, the overcommitment ratio is 1:2, 8
>> CPUs are overcommitted to 2 x 8u VMs,
>> and 16 vCPUs are bound to 8 cpu. However, one VM obtains only 2 CPUs
>> resources, the other VM has 6 CPUs.
>> The host is configured with 80 CPUs in a sched domain and other CPUs are
>> in the idle state.
>> The root cause is that the load of the host is unbalanced, some vCPUs
>> exclusively occupy CPU resources.
>> when the CPU that triggers load balance calculates imbalance value,
>> env->imbalance = 0 is calculated because of
>> local->avg_load > sds->avg_load. As a result, the load balance fails.
>> The processing logic:
>> https://github.com/torvalds/linux/commit/91dcf1e8068e9a8823e419a7a34ff4341275fb70
>>
>>
>> It's normal from kernel load balance, but it's not reasonable from the
>> perspective of VM users.
>> In cgroup v1, set cpuset.sched_load_balance=0 to modify the schedule
>> domain to fix it.
>> Is there any other method to fix this problem? thanks.
> I'm not sure how to understand your setup and why the load balance is
> not balancing correctly 16 vCPU between the 8 CPUs.
>
> >From your test case description below,  you have 8 always running
> threads in cgroup A and 8 always running threads in cgroup B and the 2
> cgroups have only 8 CPUs among 80. This should not be a problem for
> load balance. I tried something similar although not exactly the same
> with cgroupv2 and rt-app and I don't have noticeable imbalance
>
> Do you have more details that you can share about your system ?
>
> Which kernel version are you using ? Which arch ?

kernel version: 6.11.0-rc7
arch: X86_64 and cgroup v1

>> Abstracted reproduction case:
>> 1.environment information:
>>
>> [root@...alhost ~]# cat /proc/schedstat
>>
>> cpu0
>> domain0 00000000,00000000,00010000,00000000,00000001
>> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
>> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
>> cpu1
>> domain0 00000000,00000000,00020000,00000000,00000002
>> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
>> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
>> cpu2
>> domain0 00000000,00000000,00040000,00000000,00000004
>> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
>> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
>> cpu3
>> domain0 00000000,00000000,00080000,00000000,00000008
>> domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
>> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
> Is it correct to assume that domain0 is SMT, domain1 MC and domain2 PKG  ?
>   and cpu80-83 are in the other group of PKG ? and LLC is at domain1 level ?

domain0 is SMT and domain1 is MC
thread_siblings_list:0,80. 1,81. 2,82. 3,83
LLC is at domain1 level

>> 2.test case:
>>
>> vcpu.c
>> #include <stdio.h>
>> #include <unistd.h>
>>
>> int main()
>> {
>>           sleep(20);
>>           while (1);
>>           return 0;
>> }
>>
>> gcc vcpu.c -o vcpu
>> -----------------------------------------------------------------
>> test.sh
>>
>> #!/bin/bash
>>
>> #vcpu1
>> mkdir /sys/fs/cgroup/cpuset/vcpu_1
>> echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.cpus
>> echo 0 > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.mems
>> for i in {1..8}
>> do
>>           ./vcpu &
>>           pid=$!
>>           sleep 1
>>           echo $pid > /sys/fs/cgroup/cpuset/vcpu_1/tasks
>> done
>>
>> #vcpu2
>> mkdir /sys/fs/cgroup/cpuset/vcpu_2
>> echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.cpus
>> echo 0 > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.mems
>> for i in {1..8}
>> do
>>           ./vcpu &
>>           pid=$!
>>           sleep 1
>>           echo $pid > /sys/fs/cgroup/cpuset/vcpu_2/tasks
>> done
>> ------------------------------------------------------------------
>> [root@...alhost ~]# ./test.sh
>>
>> [root@...alhost ~]# top -d 1 -c -p $(pgrep -d',' -f vcpu)
>>
>> 14591 root      20   0    2448   1012    928 R 100.0   0.0 13:10.73 ./vcpu
>> 14582 root      20   0    2448   1012    928 R 100.0   0.0 13:12.71 ./vcpu
>> 14606 root      20   0    2448    872    784 R 100.0   0.0 13:09.72 ./vcpu
>> 14620 root      20   0    2448    916    832 R 100.0   0.0 13:07.72 ./vcpu
>> 14622 root      20   0    2448    920    836 R 100.0   0.0 13:06.72 ./vcpu
>> 14629 root      20   0    2448    920    832 R 100.0   0.0 13:05.72 ./vcpu
>> 14643 root      20   0    2448    924    836 R  21.0   0.0 2:37.13 ./vcpu
>> 14645 root      20   0    2448    868    784 R  21.0   0.0 2:36.51 ./vcpu
>> 14589 root      20   0    2448    900    816 R  20.0   0.0 2:45.16 ./vcpu
>> 14608 root      20   0    2448    956    872 R  20.0   0.0 2:42.24 ./vcpu
>> 14632 root      20   0    2448    872    788 R  20.0   0.0 2:38.08 ./vcpu
>> 14638 root      20   0    2448    924    840 R  20.0   0.0 2:37.48 ./vcpu
>> 14652 root      20   0    2448    928    844 R  20.0   0.0 2:36.42 ./vcpu
>> 14654 root      20   0    2448    924    840 R  20.0   0.0 2:36.14 ./vcpu
>> 14663 root      20   0    2448    900    816 R  20.0   0.0 2:35.38 ./vcpu
>> 14669 root      20   0    2448    868    784 R  20.0   0.0 2:35.70 ./vcpu
>>
> .

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ