[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20241217061317.92811-1-zhouzihan30@jd.com>
Date: Tue, 17 Dec 2024 14:13:19 +0800
From: zhouzihan30 <15645113830zzh@...il.com>
To: vincent.guittot@...aro.org
Cc: 15645113830zzh@...il.com,
bsegall@...gle.com,
dietmar.eggemann@....com,
juri.lelli@...hat.com,
linux-kernel@...r.kernel.org,
mgorman@...e.de,
mingo@...hat.com,
peterz@...radead.org,
rostedt@...dmis.org,
vschneid@...hat.com,
yaozhenguo@...com,
zhouzihan30@...com
Subject: Re: [PATCH] sched: Forward deadline for early tick
Thank you Vincent Guittot for solving my confusion about tick error: why
is it always less than 1ms on some machines.
It is normal for tick not to be equal to 1ms due to software or hardware,
but on some machines, tick is always less than 1ms, which is a bit strange.
I have not provided a good explanation for it, but now I know the reason.
The root cause is CONFIG_IRQ_TIME_ACCOUNTING.
I used bpftrace to monitor changes in the rq clock (task) in the system:
kprobe:update_rq_clock_task /pid == 6388/
{
@rq = (struct rq *)arg0;
$delta = (int64)arg1;
@clock_pre = @rq->clock_task;
printf("rq clock delta is %llu\n", $delta);
}
kretprobe:update_rq_clock_task /pid == 6388/
{
$clock_post = @rq->clock_task;
printf("rq clock task delta: %llu\n", $clock_post - @clock_pre);
}
result:
rq clock delta is 999994
rq clock task delta: 996616
rq clock delta is 1000026
rq clock task delta: 996550
rq clock delta is 1000047
rq clock task delta: 996716
rq clock delta is 999995
rq clock task delta: 996454
rq clock delta is 1000058
rq clock task delta: 996621
rq clock delta is 999987
rq clock task delta: 996457
rq clock delta is 1000047
rq clock task delta: 996621
rq clock delta is 999966
rq clock task delta: 996594
rq clock delta is 1000071
rq clock task delta: 996470
rq clock delta is 1000073
rq clock task delta: 996586
rq clock delta is 999958
rq clock task delta: 996446
rq clock delta is 1000018
rq clock task delta: 996574
rq clock delta is 999993
rq clock task delta: 996908
rq clock delta is 1000037
rq clock task delta: 996547
As Vincent Guittot said:
< the delta of rq_clock_task is always
< less than 1ms on my system but the delta of rq_clock is sometimes
< above and sometime below 1ms
According to the kernel function: update_rq_clock_task, Both
CONFIG_IRQ_TIME_ACCOUNTING and CONFIG_PARAVIRT_TIME_ACCOUNTING often
result in the delta of rq_clock_task being lower than 1ms. I counted
13016 delta cases, and in the end, 47% of the delta of rq_clock was
less than 1ms, but all of the delta of rq_clock_task is always less
than 1ms
In order to conduct a comparative experiment, I turned off those CONFIG
and re checked the changes in clock, It is found that the values of
rq clock and rq clock task become completely consistent, However,
according to the information from perf, there are still errors in tick
(slice=3ms) :
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
---------- ------ ------------ --------- --------- ---------
110.436513 [0001] perf[1414] 0.000 0.000 0.000
110.440490 [0001] bash[1341] 0.000 0.000 3.977
110.441490 [0001] bash[1344] 0.000 0.000 0.999
110.441548 [0001] perf[1414] 4.976 0.000 0.058
110.445491 [0001] bash[1344] 0.058 0.000 3.942
110.449490 [0001] bash[1341] 5.000 0.000 3.999
110.452490 [0001] bash[1344] 3.999 0.000 2.999
110.456491 [0001] bash[1341] 2.999 0.000 4.000
110.460489 [0001] bash[1344] 4.000 0.000 3.998
110.463490 [0001] bash[1341] 3.998 0.000 3.001
110.467493 [0001] bash[1344] 3.001 0.000 4.002
110.471490 [0001] bash[1341] 4.002 0.000 3.996
110.474489 [0001] bash[1344] 3.996 0.000 2.999
110.477490 [0001] bash[1341] 2.999 0.000 3.000
It seems that regardless of whether or not there is
CONFIG_IRQ_TIME_ACCOUNTING, tick errors can cause random variations in
runtime between 3 and 4ms.
< This means that the task didn't effectively get its slice because of
< time spent in IRQ context. Would it be better to set a default slice
< slightly lower than an integer number of tick
We once considered subtracting a little from a slice when setting it,
for example, if someone sets 3ms, we can subtract 0.1ms from it and
make it 2.9ms. But this is not a good solution. If someone sets it to
3.1ms, should we use 2.9ms or 3ms? There doesn't seem to be a
particularly good option, and it may lead to even greater system errors.
Changing the default value is a simple solution, in fact, we did it on
the old kernel we used (we just set it 2.9ms. On our old kernel 6.6,
tick error caused processes with the same weight have different run time,
the new kernel did not have this problem, but we still submitted this
patch because we thought unexpected behavior might occur in other
scenarios). However, apart from the kernel's default value,
different OS seemes to have different behaviors, and the default value is
often an integer number of tick... so we still hope to solve this
problem in kernel.
Powered by blists - more mailing lists