linux-kernel - Re: [PATCH] sched: Forward deadline for early tick

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBqnYE1EMxpyHnXm0S83Vez0qrceORxSkY+2jaoGHv9eQ@mail.gmail.com>
Date: Thu, 19 Dec 2024 16:53:22 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: zhouzihan30 <15645113830zzh@...il.com>
Cc: bsegall@...gle.com, dietmar.eggemann@....com, juri.lelli@...hat.com, 
	linux-kernel@...r.kernel.org, mgorman@...e.de, mingo@...hat.com, 
	peterz@...radead.org, rostedt@...dmis.org, vschneid@...hat.com, 
	yaozhenguo@...com, zhouzihan30@...com
Subject: Re: [PATCH] sched: Forward deadline for early tick

On Tue, 17 Dec 2024 at 07:14, zhouzihan30 <15645113830zzh@...il.com> wrote:
>
>
> Thank you Vincent Guittot for solving my confusion about tick error: why
> is it always less than 1ms on some machines.
>
> It is normal for tick not to be equal to 1ms due to software or hardware,
> but on some machines, tick is always less than 1ms, which is a bit strange.
> I have not provided a good explanation for it, but now I know the reason.
> The root cause is CONFIG_IRQ_TIME_ACCOUNTING.
>
> I used bpftrace to monitor changes in the rq clock (task) in the system:
>
> kprobe:update_rq_clock_task /pid == 6388/
> {
>     @rq = (struct rq *)arg0;
>     $delta = (int64)arg1;
>     @clock_pre = @rq->clock_task;
>     printf("rq clock delta is %llu\n", $delta);
> }
>
> kretprobe:update_rq_clock_task /pid == 6388/
> {
>     $clock_post = @rq->clock_task;
>     printf("rq clock task delta: %llu\n", $clock_post - @clock_pre);
> }
>
>
> result:
> rq clock delta is 999994
> rq clock task delta: 996616
> rq clock delta is 1000026
> rq clock task delta: 996550
> rq clock delta is 1000047
> rq clock task delta: 996716
> rq clock delta is 999995
> rq clock task delta: 996454
> rq clock delta is 1000058
> rq clock task delta: 996621
> rq clock delta is 999987
> rq clock task delta: 996457
> rq clock delta is 1000047
> rq clock task delta: 996621
> rq clock delta is 999966
> rq clock task delta: 996594
> rq clock delta is 1000071
> rq clock task delta: 996470
> rq clock delta is 1000073
> rq clock task delta: 996586
> rq clock delta is 999958
> rq clock task delta: 996446
> rq clock delta is 1000018
> rq clock task delta: 996574
> rq clock delta is 999993
> rq clock task delta: 996908
> rq clock delta is 1000037
> rq clock task delta: 996547
>
>
> As Vincent Guittot said:
>
> < the delta of rq_clock_task is always
> < less than 1ms on my system but the delta of rq_clock is sometimes
> < above and sometime below 1ms
>
> According to the kernel function: update_rq_clock_task, Both
>  CONFIG_IRQ_TIME_ACCOUNTING and CONFIG_PARAVIRT_TIME_ACCOUNTING often
>  result in the delta of rq_clock_task being lower than 1ms. I counted
>  13016 delta cases, and in the end, 47% of the delta of rq_clock was
>  less than 1ms, but all of the delta of rq_clock_task is always less

Having delta of rq_clock toggling above or below 1ms is normal because
of the clockevent precision, if the previous delta was longer than 1ms
then the next one will be shorter. But the average of several ticks
remains 1ms like in your trace above

>  than 1ms
>
> In order to conduct a comparative experiment, I turned off those CONFIG
>  and re checked the changes in clock, It is found that the values of
>  rq clock and rq clock task become completely consistent, However,
> according to the information from perf, there are still errors in tick
>  (slice=3ms) :

Did you check that the whole tick was accounted for the task ?
According to your trace of rq clock delta and rq clock task delta
above, most of the sum of 3 consecutives tick is greater than 3ms for
rq clock delta so I would assume that the  sum of delta_exec would be
greater than 3ms as well after 3 ticks

>
>       time    cpu  task name     wait time  sch delay   run time
>                    [tid/pid]        (msec)     (msec)     (msec)
> ---------- ------  ------------  ---------  ---------  ---------
> 110.436513 [0001]  perf[1414]        0.000      0.000      0.000
> 110.440490 [0001]  bash[1341]        0.000      0.000      3.977
> 110.441490 [0001]  bash[1344]        0.000      0.000      0.999
> 110.441548 [0001]  perf[1414]        4.976      0.000      0.058
> 110.445491 [0001]  bash[1344]        0.058      0.000      3.942
> 110.449490 [0001]  bash[1341]        5.000      0.000      3.999
> 110.452490 [0001]  bash[1344]        3.999      0.000      2.999
> 110.456491 [0001]  bash[1341]        2.999      0.000      4.000
> 110.460489 [0001]  bash[1344]        4.000      0.000      3.998
> 110.463490 [0001]  bash[1341]        3.998      0.000      3.001
> 110.467493 [0001]  bash[1344]        3.001      0.000      4.002
> 110.471490 [0001]  bash[1341]        4.002      0.000      3.996
> 110.474489 [0001]  bash[1344]        3.996      0.000      2.999
> 110.477490 [0001]  bash[1341]        2.999      0.000      3.000
>
>
> It seems that regardless of whether or not there is
>  CONFIG_IRQ_TIME_ACCOUNTING, tick errors can cause random variations in
>  runtime between 3 and 4ms.
>
>
>
> < This means that the task didn't effectively get its slice because of
> < time spent in IRQ context. Would it be better to set a default slice
> < slightly lower than an integer number of tick
>
> We once considered subtracting a little from a slice when setting it,
> for example, if someone sets 3ms, we can subtract 0.1ms from it and
>  make it 2.9ms. But this is not a good solution. If someone sets it to
>  3.1ms, should we use 2.9ms or 3ms? There doesn't seem to be a
>  particularly good option, and it may lead to even greater system errors.

And we end up giving less than its slice to task which could have set
it to this value for a good reason.


>
> Changing the default value is a simple solution, in fact, we did it on
> the old kernel we used (we just set it 2.9ms. On our old kernel 6.6,
>  tick error caused processes with the same weight have different run time,
> the new kernel did not have this problem, but we still submitted this
>  patch because we thought unexpected behavior might occur in other
>  scenarios). However, apart from the kernel's default value,
> different OS seemes to have different behaviors, and the default value is
>  often an integer number of tick... so we still hope to solve this
>  problem in kernel.
>
>
>
>
>
>
>