[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <5987be34-b527-4ff5-a17d-5f6f0dc94d6d@huawei.com>
Date: Mon, 27 Jun 2022 14:50:25 +0800
From: Zhang Qiao <zhangqiao22@...wei.com>
To: Tejun Heo <tj@...nel.org>, <mingo@...hat.com>,
<peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>
CC: <lizefan.x@...edance.com>, <hannes@...xchg.org>,
<cgroups@...r.kernel.org>, lkml <linux-kernel@...r.kernel.org>,
<vschneid@...hat.com>, <dietmar.eggemann@....com>,
<bristot@...hat.com>, <bsegall@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>, <mgorman@...e.de>
Subject: [Question] The system may be stuck if there is a cpu cgroup
cpu.cfs_quato_us is very low
Hi all,
I'm working on debuging a problem.
The testcase does follew operations:
1) create a test task cgroup, set cpu.cfs_quota_us=2000,cpu.cfs_period_us=100000.
2) run 20 test_fork[1] test process in the test task cgroup.
3) create 100 new containers:
for i in {1..100}; do docker run -itd --health-cmd="ls" --health-interval=1s ubuntu:latest bash; done
These operations are expected to succeed and 100 containers create success. however, when creating containers,
the system will get stuck and create container failed.
After debug this, I found the test_fork process frequently sleep in freezer_fork()->mutex_lock()->might_sleep()
with taking the cgroup_threadgroup_rw_sem lock, as follow:
copy_process():
cgroup_can_fork() ---> lock cgroup_threadgroup_rw_sem
sched_cgroup_fork();
->task_fork_fair(){
->update_curr(){
->__account_cfs_rq_runtime() {
resched_curr(); ---> the quota is used up, and set flag TIF_NEED_RESCHED to current
}
cgroup_post_fork();
->feezer_fork()
->mutex_lock() {
->might_sleep() ---> schedule() and the current task will be throttled long time.
->cgroup_css_set_put_fork() ---> unlock cgroup_threadgroup_rw_sem
Becuase the task cgroup's cpu.cfs_quota_us is very small and test_fork's load is very heavy, the test_fork
may be throttled long time, therefore, the cgroup_threadgroup_rw_sem read lock is held for a long time, other
processes will get stuck waiting for the lock:
1) a task fork child, will wait at copy_process()->cgroup_can_fork();
2) a task exiting will wait at exit_signals();
3) a task write cgroup.procs file will wait at cgroup_file_write()->__cgroup1_procs_write();
...
even the whole system will get stuck.
Anyone know how to slove this? Except for changing the cpu.cfs_quota_us.
[1] test_fork.c
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
int main(int argc, char **argv)
{
pid_t pid;
int count = 20;
while(1) {
for (int i = 0; i < count; i++) {
if ((pid = fork()) <0) {
printf("fork error");
return 1;
} else if (pid ==0) {
exit(0);
}
}
for (int i = 0; i < count; i++) {
wait(NULL);
}
sleep(1);
}
return 0;
}
Thanks a lot.
-Qiao
-
Powered by blists - more mailing lists