linux-kernel - Re: [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250604101821.GC8020@e132581.arm.com>
Date: Wed, 4 Jun 2025 11:18:21 +0100
From: Leo Yan <leo.yan@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Yeoreum Yun <yeoreum.yun@....com>, mingo@...hat.com, mingo@...nel.org,
	acme@...nel.org, namhyung@...nel.org, mark.rutland@....com,
	alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
	irogers@...gle.com, adrian.hunter@...el.com,
	kan.liang@...ux.intel.com, linux-perf-users@...r.kernel.org,
	linux-kernel@...r.kernel.org, David Wang <00107082@....com>
Subject: Re: [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx

On Wed, Jun 04, 2025 at 10:03:39AM +0200, Peter Zijlstra wrote:

[...]

> > > My suggestion is not reliable. Roughly read code, except for the
> > > PERF_EVENT_STATE_EXIT case, I think other error cases should also clean
> > > up the cgroup pointer.  The reason is I don't see other places to
> > > clean up the cgroup pointer for these error cases:
> > > 
> > >   PERF_EVENT_STATE_REVOKED
> > >   PERF_EVENT_STATE_DEAD
> > 
> > Those should be done here; on the first transition into these states.
> > 
> > > Only in the PERF_EVENT_STATE_ERROR state, we don't need to cleanup
> > > cgroup as this has already been handled in merge_sched_in().
> > > 
> > > So a correct condition would be:
> > > 
> > >     if (event->state > PERF_EVENT_STATE_OFF ||
> > >         event->state <= PERF_EVENT_STATE_EXIT)
> > >         perf_cgroup_event_disable(event, ctx);
> > 
> > I'm too tired to get my head straight. I'll look tomorrow.
> 
> 
> Right; had a sleep. Lets do this :-)

Thanks a lot for the state machine diagram.

> So the normal states are:
> 
>             ACTIVE    ---.
>                ^         |
>                |         |
>        sched_{in,out}()  |
>                |         |
>                v         |
>       ,---> INACTIVE  <--+
>       |                  |
>       |            {dis,en}able()
>   sched_in()             |
>       |       OFF     <--+
>       |                  |
>       `--->  ERROR    ---'
> 
> That is:
> 
> sched_in:     INACTIVE          -> {ACTIVE,ERROR}

Strictly speaking, sched_in() can keep INACTIVE state if an event is
rotated.

> sched_out:    ACTIVE            -> INACTIVE
> disable:      {ACTIVE,INACTIVE} -> OFF
> enable:       {OFF,ERROR}       -> INACTIVE
> 
> Where OFF/ERROR are 'disable' and have this perf_cgroup_event_disable()
> called.
> 
> Then we have {EXIT,REVOKED,DEAD} states which are various shades of
> defunct events:
> 
>  - EXIT means task that the even was assigned to died, but child events
>    still live, and further children can still be created. But the event
>    itself will never be active again. It can only transition to
>    {REVOKED,DEAD};
> 
>  - REVOKED means the PMU the event was associated with is gone; all
>    functionality is stopped but the event is still alive. Can only
>    transition to DEAD;
> 
>  - DEAD event really is DYING tearing down state and freeing bits.
> 
> And now we have the sitation that __perf_remove_from_context() can do:
> 
>   {ACTIVE,INACTIVE,OFF,ERROR} -> {OFF,EXIT,REVOKED,DEAD}

A detailed transition is:

  Case 1: {ACTIVE} -> {INACTIVE} -> {OFF,EXIT,REVOKED,DEAD}
  Case 2: {ERROR} -> {ERROR,EXIT,REVOKED,DEAD}
  Case 3: {OFF} -> {OFF,EXIT,REVOKED,DEAD}

> Where the {OFF,ERROR} -> * transition already have
> perf_cgroup_event_disable(), but the {ACTIVE,INACTIVE} -> * part has
> not.

Just a minor concern.

I noticed perf_put_aux_event() sets the ERROR state for sibling events
of an AUX event. IIUC, the AUX event is the group leader, so we only
need to clean up the cgroup pointer for the AUX event, and simply set
the ERROR state for its sibling events, correct?

> The condition:
> 
>   if (event->state > PERF_EVENT_STATE_OFF)
> 
> captures exactly that {ACTIVE,INACTIVE} that still needs the cgroup
> disable. Every other state is already a disabled state.
> 
> 
> Agreed?

Except above concern, otherwise, the conclusion makes sense to me.

> Also, I should probably stick most of all this in a comment somewhere.

Totally agree. The comment would be very useful!

Thanks,
Leo