[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e9b3ea46-550e-e850-57a1-f5df2c7b33d1@loongson.cn>
Date: Wed, 28 Aug 2019 13:02:40 +0800
From: 乔崇 <qiaochong@...ngson.cn>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: update_curr changed sum_exec_runtime to 1
when sum_exec_runtime is 0 beacuse some kernel code use sum_exec_runtime==0
to test task just be forked.
I find this when I run a docker test script.
#-----script begin--
for((i=0; i<1000000; i++))
do
echo "=================================== test $i start
================================"
rm -rf /media/ram/docker/test/*
docker run -d --name test -h test -v
/media/docker/app/dataindex:/media/dataindex -v
/media/ram/docker/test:/run -v /media/docker/app/test:/media/container
-v /media/wave:/media/wave:ro -v /run/systemd/journal/dev-log:/dev/log
--ipc=host --log-driver=journald --init arpubt:mips64el-1.2
docker ps -s
docker stats --no-stream
cp /media/arp/testsvc.tar.gz /media/ram/docker/test/
docker exec -it test app-manager install testsvc
sleep 3
docker ps -s
docker stats --no-stream
docker exec -it test monit status
sleep 3
docker exec -it test app-manager disable testsvc
sleep 1
docker exec -it test monit status
sleep 3
docker exec -it test app-manager enable testsvc
sleep 1
docker exec -it test app-manager disable testsvc
sleep 1
docker exec -it test app-manager enable testsvc
sleep 1
docker exec -it test app-manager disable testsvc
sleep 1
docker exec -it test app-manager enable testsvc
sleep 1
docker ps -s
docker stats --no-stream
sleep 1
docker exec -it test app-manager uninstall testsvc
sleep 1
docker rm -f test
sleep 1
docker ps -s
docker stats --no-stream
echo "================================== test $i end
=================================="
done
#---- script end---
some times, docker exec will stop longtime.
testsvc is a busy loop application.
int i;
int main()
{
while(1) i++;
}
When use kgdb, I see there are two tasks on run queue, docker exec's
vruntime is very large, but which is value is almost same to linux
cgroup's min_vruntime. like bellow:
CFS0 0x98000002de50f6e0, enter cgroup 0xffffffff81349900
CFS1 0x98000002dba65700, enter cgroup 0x98000002dbc4a700
SE:0x98000002dab95618 vruntime:49388638792 14348 14344 runc:[2:INIT]
ra=0xffffffff80e8e110 sp=0x98000002d2b33d40 epc=0x12006b0f0
usp=0xffeafc6810 task=0x98000002dab955b0 tinfo=0x98000002d2b30000
uregs=0x98000002d2b33eb0 stat=0x1
CFS2 0x98000002dba57700, enter cgroup 0x98000000014bf100
se:0x98000002daa92a78 vruntime:49391638792 (always run)14266 13758
testsvc ra=0xffffffff80e8e110 sp=0x98000002d2cbfe30 epc=0x1200017f0
usp=0xffffc8bc10 task=0x98000002daa92a10 tinfo=0x98000002d2cbc000
uregs=0x98000002d2cbfeb0 stat=0x0
se:0x98000002dab5b488 vruntime:242669408647 (large) 14349 14344
runc:[2:INIT] ra=0xffffffff80e8e110 sp=0x98000002d2b37ba0
epc=0x12006b3d8 usp=0xffea7c6798 task=0x98000002dab5b420
tinfo=0x98000002d2b34000 uregs=0x98000002d2b37eb0 stat=0x0
I add check code in sched_move_task, and found that task has been run
and is stopped(on_rq == 0), vruntime is not changed through
sched_move_task function, because se->sum_exec_runtime == 0
I add debug code in update_curr:
if (unlikely((s64)delta_exec <= 0))
{
if (!curr->sum_exec_runtime)
{
struct task_struct *p;
p = container_of(curr, struct task_struct, se);
if ((p->nvcsw + p->nivcsw) != 0 && entity_is_task(curr))
{
curr->sum_exec_runtime += 1;
dump_stack();
}
}
dump_stack also sometimes was called.
Now after I read your letter, I know the reason is:
unsigned long long __weak sched_clock(void)
{
return (unsigned long long)(jiffies - INITIAL_JIFFIES)
* (NSEC_PER_SEC / HZ);
}
My kernel use default sched_clock, which use jiffies as schedule clock.
I should use another high resolution schedule clock, then update_curr's
dalta_exec will not be 0.
So thanks alot, but I still think curr->sum_exec_runtime check code is
not accurate enough.
在 2019/8/27 下午10:36, Peter Zijlstra 写道:
> On Mon, Aug 26, 2019 at 07:46:50PM +0800, QiaoChong wrote:
>> From: Chong Qiao <qiaochong@...ngson.cn>
>>
>> Such as:
>> cpu_cgroup_attach>
>> sched_move_task>
>> task_change_group_fair>
>> task_move_group_fair>
>> detach_task_cfs_rq>
>> vruntime_normalized>
>>
>> /*
>> * When !on_rq, vruntime of the task has usually NOT been normalized.
>> * But there are some cases where it has already been normalized:
>> *
>> * - A forked child which is waiting for being woken up by
>> * wake_up_new_task().
>> * - A task which has been woken up by try_to_wake_up() and
>> * waiting for actually being woken up by sched_ttwu_pending().
>> */
>> if (!se->sum_exec_runtime ||
>> (p->state == TASK_WAKING && p->sched_remote_wakeup))
>> return true;
>>
>> p->se.sum_exec_runtime is 0, does not mean task not been run (A forked child which is waiting for being woken up by wake_up_new_task()).
>>
>> Task may have been scheduled multimes, but p->se.sum_exec_runtime is still 0, because delta_exec maybe 0 in update_curr.
>>
>> static void update_curr(struct cfs_rq *cfs_rq)
>> {
>> ...
>> delta_exec = now - curr->exec_start;
>> if (unlikely((s64)delta_exec <= 0))
>> return;
>> ...
>>
>> curr->sum_exec_runtime += delta_exec;
>> ...
>> }
>>
>> Task has been run and is stopped(on_rq == 0), vruntime not been normalized, but se->sum_exec_runtime == 0.
>> This cause vruntime_normalized set on_rq 1, and does not normalize vruntime.
>> This may cause task use old vruntime in old cgroup, which maybe very large than task's vruntime in new cgroup.
>> Which may cause task may not scheduled in run queue for long time after been waked up.
>>
>> Now I change sum_exec_runtime to 1 when sum_exec_runtime == 0 in update_curr to make sun_exec_runtime not 0.
> Have you actually observed this? It is very hard to have a 0 delta
> between two scheduling events.
>
--
乔崇 qiaochong@...ngson.cn
2019年 08月 28日 星期三 11:31:43 CST
Powered by blists - more mailing lists