[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260109072451.2843331-1-dengjun.su@mediatek.com>
Date: Fri, 9 Jan 2026 15:24:47 +0800
From: Dengjun Su <dengjun.su@...iatek.com>
To: <peterz@...radead.org>
CC: <angelogioacchino.delregno@...labora.com>, <bsegall@...gle.com>,
<dengjun.su@...iatek.com>, <dietmar.eggemann@....com>,
<haiqiang.gong@...iatek.com>, <juri.lelli@...hat.com>,
<linux-arm-kernel@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
<linux-mediatek@...ts.infradead.org>, <matthias.bgg@...il.com>,
<mgorman@...e.de>, <mike.zhang@...iatek.com>, <mingo@...hat.com>,
<peijun.huang@...iatek.com>, <rostedt@...dmis.org>,
<vincent.guittot@...aro.org>, <vschneid@...hat.com>
Subject: Re: [PATCH] sched/rt: fix incorrect schedstats for rt thread
On Thu, 2026-01-08 at 12:16 +0100, Peter Zijlstra wrote:
> On Thu, Jan 08, 2026 at 11:13:07AM +0800, Dengjun Su wrote:
> > For RT thread, only 'set_next_task_rt' will call
> > 'update_stats_wait_end_rt' to update schedstats information.
> > However, during the RT migration process,
> > 'update_stats_wait_start_rt' will be called twice, which
> > will cause the values of wait_max and wait_sum to be incorrect.
>
> Right, that looses time. Also note that I think dl has the same
> issue.
Hi Peter,
Thanks for the feedback. Yes, sorry for miss dl class,
I will update it in V2.
>
> > The specific output as follows:
> > $ cat /proc/6046/task/6046/sched | grep wait
> > wait_start : 0.000000
> > wait_max : 496717.080029
> > wait_sum : 7921540.776553
> >
> > Add 'update_stats_wait_end_rt' in 'update_stats_dequeue_rt' to
> > update schedstats information when dequeue_task.
>
> This needs a few more words on why this is correct -- notably it took
> me
> a little time to find the 'task_on_rq_migrating()' case in
> __update_stats_wait_end() which makes this not actually 'end'.
>
> But then the corresponding clause in __update_stats_wait_start()
> gives
> me a headache:
>
> 'wait_start > prev_wait_start'
>
> I mean, wtf. Should that not equally be using task_on_rq_migrating()
> ?
>
> Can you please take a hard look at all that and fix up things
> all-round?
>
A complete schedstats information update flow of migrate should be
__update_stats_wait_start() [enter queue A, stage 1] ->
__update_stats_wait_end() [leave queue A, stage 2] ->
__update_stats_wait_start() [enter queue B, stage 3] ->
__update_stats_wait_end() [start running on queue B, stage 4]
Stage 1: prev_wait_start is 0, and in the end, wait_start records the
time of entering the queue.
Stage 2: task_on_rq_migrating(p) is true, and wait_start is updated to
the waiting time on queue A.
Stage 3: prev_wait_start is the waiting time on queue A, wait_start is
the time of entering queue B, and wait_start is expected to be greater
than prev_wait_start. Under this condition, wait_start is updated to
(the moment of entering queue B) - (the waiting time on queue A).
Stage 4: the final wait time = (time when starting to run on queue B)
- (time of entering queue B) + (waiting time on queue A) = waiting
time on queue B + waiting time on queue A.
The current problem is that stage 2 does not call __update_stats_wait_end
to update wait_start, which causes the final computed wait time = waiting
time on queue B + the moment of entering queue A, leading to incorrect
wait_max and wait_sum.
For __update_stats_wait_end(), task_on_rq_migrating(p) is needed to
distinguish between stage 2 and stage 4 because they involve different
processing flows, but for __update_stats_wait_start(), it is not necessary
to distinguish between stage 1 and stage 3.
As for adding the condition wait_start > prev_wait_start, I think it is
more like a mechanism to prevent statistical deviations caused by time
inconsistencies.
Thanks
Powered by blists - more mailing lists