linux-kernel - [PATCH v2] events/core: fix acoount failure for event's total_enable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241220100202.804062-1-yeoreum.yun@arm.com>
Date: Fri, 20 Dec 2024 10:02:02 +0000
From: Yeoreum Yun <yeoreum.yun@....com>
To: peterz@...radead.org,
	mingo@...hat.com,
	acme@...nel.org,
	namhyung@...nel.org,
	mark.rutland@....com,
	alexander.shishkin@...ux.intel.com,
	jolsa@...nel.org,
	irogers@...gle.com,
	adrian.hunter@...el.com,
	kan.liang@...ux.intel.com
Cc: linux-perf-users@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Yeoreum Yun <yeoreum.yun@....com>
Subject: [PATCH v2] events/core: fix acoount failure for event's total_enable_time

The perf core code fails to account for total_enable_time of event
when its state is inactive.

Here is the error case for failure to account for total_enable_time for
core event:

sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 2s
...
armv8_pmuv3_0/event=0x08/: 1138698008 2289429840 2174835740
armv8_pmuv3_1/event=0x08/: 1826791390 1950025700 847648440

 Performance counter stats for 'stress-ng --pthread=2 -t 2s':

     1,138,698,008      armv8_pmuv3_0/event=0x08/                                               (94.99%)
     1,826,791,390      armv8_pmuv3_1/event=0x08/                                               (43.47%)

Since above two events are belong to the same task context and
mutually-exclusive per cpu (they couldn't be active at the same time on the same cpu),
the total_enable_time should be the same (the 2nd number among 3 numbers)
and the summation of ratio should be 100%.

This account failure of total_enable_time for event could happen in below sequence.

1. two event opened with:
   - first event (e0) is opened with pmu0(p0) which could be added on CPU0.
   - second event (e1) is opened with pmu1(p1) which could be added on CPU1.
   - these two event belongs to the same task_ctx.

   at this point:
     task_ctx->time = 0
     e0->total_enable_time = 0
     e0->total_running_time = 0
     e1->total_enable_time = 0
     e1->total_running_time = 0

2. the task_ctx is sched in CPU0.
   - In ctx_sched_in(), the task_ctx->time doesn't updated.
   - In event_sched_in() e0 is activated so, its state becomes ACTIVE.
   - In event_sched_in() e1 is activated, but soon becomes !ACTIVE
     because pmu1 doesn't support CPU1 so it failed in pmu1->add().

   at this point:
     task_ctx->time = 0
     e0->total_enable_time = 0
     e0->total_running_time = 0
     e1->total_enable_time = 0
     e1->total_running_time = 0

3. the task_ctx is sched out from CPU0. and suppose scheulded time as t1.
   - In ctx_sched_out(), the task_ctx->time updated with t1.
   - e0 was ACTIVE, so In event_sched_out(),
     e0->total_enable_time and e0->total_running_time are updated with t1.
   - e1 was !ACTIVE, so it doesn't update
     e1->total_enable_time and e1->total_running_time.

   at this point:
     task_ctx->time = t1
     e0->total_enable_time = t1
     e0->total_running_time = t1
     e1->total_enable_time = 0
     e1->total_running_time = 0

4. the task_ctx is sched in to CPU1.
   - task_ctx->time doesn't changed.
   - In event_sched_in() e0 is activated, but soon becomes !ACTIVE
     because pmu1 doesn't support CPU1 so it failed in pmu1->add().
   - In event_sched_in() e1 is activated, and when it changes event
     state to ACTIVE, e1->total_enable_time is updated with
     task_ctx->time.

   at this point:
   task_ctx->time = t1
   e0->total_enable_time = t1
   e0->total_running_time = t1
   e1->total_enable_time = t1
   e1->total_running_time = 0

5. the task_ctx is sched out from CPU1. and suppose scheulded time as t2.
   - In ctx_sched_out(), the task_ctx->time updated with t1 + t2.
   - e0 was !ACTIVE, so it doesn't update
     e0->total_enable_time and e0->total_running_time.
   - e1 was ACTIVE, so In event_sched_out(),
     e1->total_enable_time and e1->total_running_time are updated with t2.

   at this point:
     task_ctx->time = t1 + t2
     e0->total_enable_time = t1
     e0->total_running_time = t1
     e1->total_enable_time = t1 + t2
     e1->total_running_time = t2

6. the task_ctx is sched in CPU0 again.
   - task_ctx->time doesn't changed.
   - In event_sched_in() e0 is activated, and when it changes event
     state to ACTIVE, e0->total_enable_time is updated with
     task_ctx->time.
   - In event_sched_in() e1 is activated, but soon becomes !ACTIVE
     because pmu1 doesn't support CPU1 so it failed in pmu1->add().

   at this point:
     task_ctx->time = t1 + t2
     e0->total_enable_time = t1 + t2
     e0->total_running_time = t1
     e1->total_enable_time = t1 + t2
     e1->total_running_time = t2

7. the task exit and task_ctx is sched out. and suppose scheulded time as t3.
   - In ctx_sched_out(), the task_ctx->time updated with t1 + t2 + t3.
   - e0 was ACTIVE, so In event_sched_out(),
     e0->total_enable_time and e1->total_running_time are updated with t3.
   - e1 was !ACTIVE, so it doesn't update
     e1->total_enable_time and e1->total_running_time.

   at this point:
     task_ctx->time = t1 + t2 + t3
     e0->total_enable_time = t1 + t2 +t3
     e0->total_running_time = t1 + t3
     e1->total_enable_time = t1 + t2
     e1->total_running_time = t2

In step 7, amount of t3 enable time is missing in e1.
So, the perf prints error ratio for total_running_time/total_enable_time
for e1.

Here is the summary of change of time as timeline:

               CPU0                  CPU1                     CPU0
          |  in | out |       | in | out     |       | in      | out (exit)   |
------------------------------------------------------------------------------
ctx->time |  0  | t1  |       | t1 | t1 + t2 |       | t1 + t2 | t1 + t2 + t3 |
------------------------------------------------------------------------------
e0->ena   |  0  | t1  |       | t1 | t1      |       | t1 + t2 | t1 + t2 + t3 |
------------------------------------------------------------------------------
e0->run   |  0  | t1  |       | t1 | t1      |       | t1 + t2 | t1 + t3      |
------------------------------------------------------------------------------
e1->ena   |  0  | 0   |       | t1 | t1 + t2 |       | t1 + t2 | t1 + t2      |
------------------------------------------------------------------------------
e1->run   |  0  | 0   |       | 0  | t2      |       | t1 + t2 | t2           |
------------------------------------------------------------------------------

To address this, update total_enable_time in event_sched_out() when event state
is PERF_EVENT_STATE_INACTIVE.

After this patch, this problem is gone like:

sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 10s
...
armv8_pmuv3_0/event=0x08/: 15396770398 32157963940 21898169000
armv8_pmuv3_1/event=0x08/: 22428964974 32157963940 10259794940

 Performance counter stats for 'stress-ng --pthread=2 -t 10s':

    15,396,770,398      armv8_pmuv3_0/event=0x08/                                               (68.10%)
    22,428,964,974      armv8_pmuv3_1/event=0x08/                                               (31.90%)

Signed-off-by: Yeoreum Yun <yeoreum.yun@....com>
---
 kernel/events/core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 065f9188b44a..308850449a84 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2311,6 +2311,11 @@ event_sched_out(struct perf_event *event, struct perf_event_context *ctx)
 	WARN_ON_ONCE(event->ctx != ctx);
 	lockdep_assert_held(&ctx->lock);

+	if (event->state == PERF_EVENT_STATE_INACTIVE) {
+		perf_event_update_time(event);
+		return;
+	}
+
 	if (event->state != PERF_EVENT_STATE_ACTIVE)
 		return;

@@ -2351,7 +2356,7 @@ group_sched_out(struct perf_event *group_event, struct perf_event_context *ctx)
 {
 	struct perf_event *event;

-	if (group_event->state != PERF_EVENT_STATE_ACTIVE)
+	if (group_event->state < PERF_EVENT_STATE_INACTIVE)
 		return;

 	perf_assert_pmu_disabled(group_event->pmu_ctx->pmu);
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}