[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240613185906.31082-1-sieberf@amazon.com>
Date: Thu, 13 Jun 2024 20:59:06 +0200
From: Fernand Sieber <sieberf@...zon.com>
To:
CC: <linux-perf-users@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
Fernand Sieber <sieberf@...zon.com>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>
Subject: [PATCH] perf: Timehist account sch delay for scheduled out running
When using perf timehist, sch delay is only computed for a waking task,
not for a pre empted task. This patches addresses this problem.
Example of `perf timehist` report before the patch for `stress` task
competing with each other.
First column is wait time, second column sch delay, third column
runtime.
```
1.492060 [0000] s stress[81] 1.999 0.000 2.000 R next: stress[83]
1.494060 [0000] s stress[83] 2.000 0.000 2.000 R next: stress[81]
1.496060 [0000] s stress[81] 2.000 0.000 2.000 R next: stress[83]
1.498060 [0000] s stress[83] 2.000 0.000 1.999 R next: stress[81]
```
After the patch, it looks like this (note that all wait time is now sch
delay instead of zero):
```
1.492060 [0000] s stress[81] 1.999 1.999 2.000 R next: stress[83]
1.494060 [0000] s stress[83] 2.000 2.000 2.000 R next: stress[81]
1.496060 [0000] s stress[81] 2.000 2.000 2.000 R next: stress[83]
1.498060 [0000] s stress[83] 2.000 2.000 1.999 R next: stress[81]
```
In timehist:
* wait time represents the duration waiting for any system resource
* sch delay represents the duration waiting for cpu system resources
This is based on perf comments (dt_wait = wait time, dt_delay = sch
delay):
```
/*
* Explanation of delta-time stats:
*
* t = time of current schedule out event
* tprev = time of previous sched out event
* also time of schedule-in event for current task
* last_time = time of last sched change event for current task
* (i.e, time process was last scheduled out)
* ready_to_run = time of wakeup for current task
*
* -----|------------|------------|------------|------
* last ready tprev t
* time to run
*
* |-------- dt_wait --------|
* |- dt_delay -|-- dt_run --|
*
* dt_run = run time of current task
* dt_wait = time between last schedule out event for task and tprev
* represents time spent off the cpu
* dt_delay = time between wakeup and schedule-in of task
*/
```
The problem with the current logic is that last time is only set when
waking a task. Therefore it is not set for a pre empted task. To fix
this, we set last time to the current sample time if a scheduled out
task (on the switch tracepoint) is in state running.
Signed-off-by: Fernand Sieber <sieberf@...zon.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
---
tools/perf/builtin-sched.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 5977c49ae2c7..7422c930abaf 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2659,7 +2659,10 @@ static int timehist_sched_change_event(struct perf_tool *tool,
tr->last_state = state;
/* sched out event for task so reset ready to run time */
- tr->ready_to_run = 0;
+ if (state == 'R')
+ tr->ready_to_run = t;
+ else
+ tr->ready_to_run = 0;
}
evsel__save_time(evsel, sample->time, sample->cpu);
--
2.40.1
Powered by blists - more mailing lists