[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z2KxQmmoCWyA7raR@e129823.arm.com>
Date: Wed, 18 Dec 2024 11:25:54 +0000
From: Yeoreum Yun <yeoreum.yun@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, acme@...nel.org, amhyung@...nel.org,
mark.rutland@....com, alexander.shishkin@...ux.intel.com,
jolsa@...nel.org, irogers@...gle.com, adrian.hunter@...el.com,
kan.liang@...ux.intel.com, james.clark@...aro.org,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
nd@....com
Subject: Re: [PATCH 1/1] events/core: fix error run/enable ratio display
Hi Peter.
>
> git log --oneline kernel/events/core.c
>
> Compare result with your subject.
>
> On Wed, Dec 18, 2024 at 08:46:56AM +0000, Yeoreum Yun wrote:
> > when runs below command (with core pmu) sometime perf prints error
> > ratio for each perf.
> >
> > sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 2s
> > ...
> > armv8_pmuv3_0/event=0x08/: -1: 1081702126 2289429840 2174835740
> > armv8_pmuv3_1/event=0x08/: -1: 794080238 1950025700 847648440
> > armv8_pmuv3_0/event=0x08/: 1138698008 2289429840 2174835740
> > armv8_pmuv3_1/event=0x08/: 1826791390 1950025700 847648440
> >
> > Performance counter stats for 'stress-ng --pthread=2 -t 2s':
> >
> > 1,138,698,008 armv8_pmuv3_0/event=0x08/ (94.99%)
> > 1,826,791,390 armv8_pmuv3_1/event=0x08/ (43.47%)
>
> At this point I've no idea what the error is. You start by saying there
> is an error, but then give no clue.
The error what I want to point is that the ratio print (94.99% and
43.47%).
Since each pmu is for core and when with above command, these event
belong to same each task context, the total enable time should be the
same. (which is described in second columns among three data.)
But
armv8_pmuv3_0/event=0x08/: 2289429840
armv8_pmuv3_1/event=0x08/: 1950025700
That means there is missing count for total_enable_time which saved in
each event.
>
> > This happens because of missing total_enable_time of inactivate event.
> > it's enabled time doesn't update at the event_sched_out() because state
> > isn't PERF_EVENT_STATE_ACTIVE but PERF_EVENT_STATE_INACTIVE.
>
> time tracking doesn't happen at sched calls -- you'll find a grand total
> of 0 perf_event_update_time() calls in them.
>
> Also, things like group_sched_out() don't even call into
> event_sched_out() when !ACTIVE.
>
> So why is event_sched_out() the right place to change here?
>
In below case, the last time is update at __perf_remove_from_context()
which calls event_sched_out(). (e0, e1 is child event).
and total_enable_time is tracked per each event. So I think it is right
place to change.
But, for exact tracking, I think group_sched_out() should allow when !ACTIVE
to update event time.
> Also, I've still no clue under what condition all this is supposed to
> have happened :/
>
> > For example, Suppose there're two events:
> > e0: with pmu0 which supports only cpu0
> > e1: with pmu1 which supports only cpu1.
> > then
> >
> > CPU0 CPU1 CPU0
> > | in | out | | in | out | | in | out (exit) |
> > ------------------------------------------------------------------------------
> > ctx->time | 0 | t1 | | t1 | t1 + t2 | | t1 + t2 | t1 + t2 + t3 |
> > ------------------------------------------------------------------------------
> > e0->ena | 0 | t1 | | t1 | t1 | | t1 + t2 | t1 + t2 + t3 |
> > ------------------------------------------------------------------------------
> > e0->run | 0 | t1 | | t1 | t1 + t2 | | t1 + t2 | t1 + t3 |
> > ------------------------------------------------------------------------------
> > e1->ena | 0 | 0 | | t1 | t1 + t2 | | t1 + t2 | t1 + t2 |
> > ------------------------------------------------------------------------------
> > e1->run | 0 | 0 | | 0 | t2 | | t1 + t2 | t2 |
> > ------------------------------------------------------------------------------
> >
> > at the CPU0, state of e1 is PERF_EVENT_STATE_INACTIVE since it doesn't
> > support on CPU0. So when event_sched_out() it failed to update its
> > total_enable_time and it makes error print of run/ena ratio.
>
> Sorry, I have no idea what you're saying. I can't relate the table to
> the text in any meaningful way.
Sorry to lack of explain.
Please see the first CPU0,
at first ctx_sched_in(), ctx->time is 0.
and e0 is ACTIVE and e1 is changed ACTIVE but to !ACTIVE because of add
failure. and at the ctx_sched_out(), ctx->time is updated with running
time with t1. and e0->ena update with t1 but e1->ena is not updated
becuase it is !ACTIVE.
so, first out column describe this situation.
next, when this task->ctx runs on CPU1
at ctx_sched_in(), e0->ena is the same and it become ACTIVE to !ACTIVE
and e1->ena is updated with t1 on setting state ACTIVE.
at ctx_sched_out(), e0->ena doesn't change because its state !ACTIVE
and e1->ena change with t1 + t2 with runtime with t2.
and last, when this task->ctx runs again CPU0
at ctx_sched_in(), e0->ena is update with t1 + t2 while it becomes ACTIVE.
but e1->ena is the same to t1 + t2.
and when task become exit, e0->ena is updated with t1 + t2 + t3.
But, e1->ena is t1 + t2 because It's !ACTIVE.
So, the e1->ena time is missing for "t3" at the exit time
and it prints wrong ratio value.
this patch by updating enable_time on the event_sched_out() is called while
perf_event_exit_event() at exit, it fixes above case.
But, I think it also need to group_scehd_out() should allow for !ACTIVE
to update event's enable time properly.
Thanks.
Powered by blists - more mailing lists