[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250604080339.GB35970@noisy.programming.kicks-ass.net>
Date: Wed, 4 Jun 2025 10:03:39 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Leo Yan <leo.yan@....com>
Cc: Yeoreum Yun <yeoreum.yun@....com>, mingo@...hat.com, mingo@...nel.org,
acme@...nel.org, namhyung@...nel.org, mark.rutland@....com,
alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
irogers@...gle.com, adrian.hunter@...el.com,
kan.liang@...ux.intel.com, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org, David Wang <00107082@....com>
Subject: Re: [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx
On Tue, Jun 03, 2025 at 04:44:14PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 03, 2025 at 03:00:40PM +0100, Leo Yan wrote:
>
> > > + if (event->state > PERF_EVENT_STATE_OFF)
> > > + perf_cgroup_event_disable(event, ctx);
> > > +
> >
> > As we discussed, seems to me, the issue is caused by an ambigous state
> > machine transition:
> >
> > When a PMU event state is PERF_EVENT_STATE_EXIT, the current code does
> > not transite the state to PERF_EVENT_STATE_OFF. As a result, the
> > list_del_event() function skips to clean up cgroup pointer for non OFF
> > states. This is different from the code prior to the commit a3c3c6667,
> > which transits states EXIT -> INACTIVE -> OFF.
>
> Right.
>
> > My suggestion is not reliable. Roughly read code, except for the
> > PERF_EVENT_STATE_EXIT case, I think other error cases should also clean
> > up the cgroup pointer. The reason is I don't see other places to
> > clean up the cgroup pointer for these error cases:
> >
> > PERF_EVENT_STATE_REVOKED
> > PERF_EVENT_STATE_DEAD
>
> Those should be done here; on the first transition into these states.
>
> > Only in the PERF_EVENT_STATE_ERROR state, we don't need to cleanup
> > cgroup as this has already been handled in merge_sched_in().
> >
> > So a correct condition would be:
> >
> > if (event->state > PERF_EVENT_STATE_OFF ||
> > event->state <= PERF_EVENT_STATE_EXIT)
> > perf_cgroup_event_disable(event, ctx);
>
> I'm too tired to get my head straight. I'll look tomorrow.
Right; had a sleep. Lets do this :-)
So the normal states are:
ACTIVE ---.
^ |
| |
sched_{in,out}() |
| |
v |
,---> INACTIVE <--+
| |
| {dis,en}able()
sched_in() |
| OFF <--+
| |
`---> ERROR ---'
That is:
sched_in: INACTIVE -> {ACTIVE,ERROR}
sched_out: ACTIVE -> INACTIVE
disable: {ACTIVE,INACTIVE} -> OFF
enable: {OFF,ERROR} -> INACTIVE
Where OFF/ERROR are 'disable' and have this perf_cgroup_event_disable()
called.
Then we have {EXIT,REVOKED,DEAD} states which are various shades of
defunct events:
- EXIT means task that the even was assigned to died, but child events
still live, and further children can still be created. But the event
itself will never be active again. It can only transition to
{REVOKED,DEAD};
- REVOKED means the PMU the event was associated with is gone; all
functionality is stopped but the event is still alive. Can only
transition to DEAD;
- DEAD event really is DYING tearing down state and freeing bits.
And now we have the sitation that __perf_remove_from_context() can do:
{ACTIVE,INACTIVE,OFF,ERROR} -> {OFF,EXIT,REVOKED,DEAD}
Where the {OFF,ERROR} -> * transition already have
perf_cgroup_event_disable(), but the {ACTIVE,INACTIVE} -> * part has
not.
The condition:
if (event->state > PERF_EVENT_STATE_OFF)
captures exactly that {ACTIVE,INACTIVE} that still needs the cgroup
disable. Every other state is already a disabled state.
Agreed?
Also, I should probably stick most of all this in a comment somewhere.
Powered by blists - more mailing lists