linux-kernel - Re: [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250604080339.GB35970@noisy.programming.kicks-ass.net>
Date: Wed, 4 Jun 2025 10:03:39 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Leo Yan <leo.yan@....com>
Cc: Yeoreum Yun <yeoreum.yun@....com>, mingo@...hat.com, mingo@...nel.org,
	acme@...nel.org, namhyung@...nel.org, mark.rutland@....com,
	alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
	irogers@...gle.com, adrian.hunter@...el.com,
	kan.liang@...ux.intel.com, linux-perf-users@...r.kernel.org,
	linux-kernel@...r.kernel.org, David Wang <00107082@....com>
Subject: Re: [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx

On Tue, Jun 03, 2025 at 04:44:14PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 03, 2025 at 03:00:40PM +0100, Leo Yan wrote:
> 
> > > +	if (event->state > PERF_EVENT_STATE_OFF)
> > > +		perf_cgroup_event_disable(event, ctx);
> > > +
> > 
> > As we discussed, seems to me, the issue is caused by an ambigous state
> > machine transition:
> > 
> > When a PMU event state is PERF_EVENT_STATE_EXIT, the current code does
> > not transite the state to PERF_EVENT_STATE_OFF. As a result, the
> > list_del_event() function skips to clean up cgroup pointer for non OFF
> > states. This is different from the code prior to the commit a3c3c6667,
> > which transits states EXIT -> INACTIVE -> OFF.
> 
> Right.
> 
> > My suggestion is not reliable. Roughly read code, except for the
> > PERF_EVENT_STATE_EXIT case, I think other error cases should also clean
> > up the cgroup pointer.  The reason is I don't see other places to
> > clean up the cgroup pointer for these error cases:
> > 
> >   PERF_EVENT_STATE_REVOKED
> >   PERF_EVENT_STATE_DEAD
> 
> Those should be done here; on the first transition into these states.
> 
> > Only in the PERF_EVENT_STATE_ERROR state, we don't need to cleanup
> > cgroup as this has already been handled in merge_sched_in().
> > 
> > So a correct condition would be:
> > 
> >     if (event->state > PERF_EVENT_STATE_OFF ||
> >         event->state <= PERF_EVENT_STATE_EXIT)
> >         perf_cgroup_event_disable(event, ctx);
> 
> I'm too tired to get my head straight. I'll look tomorrow.


Right; had a sleep. Lets do this :-)


So the normal states are:

	    ACTIVE    ---.
	       ^         |
	       |         |
       sched_{in,out}()  |
	       |         |
	       v         |
      ,---> INACTIVE  <--+
      |                  |
      |            {dis,en}able()
  sched_in()             |
      |       OFF     <--+
      |                  |
      `--->  ERROR    ---'

That is:

sched_in:	INACTIVE          -> {ACTIVE,ERROR}
sched_out:	ACTIVE            -> INACTIVE
disable:	{ACTIVE,INACTIVE} -> OFF
enable:         {OFF,ERROR}       -> INACTIVE

Where OFF/ERROR are 'disable' and have this perf_cgroup_event_disable()
called.

Then we have {EXIT,REVOKED,DEAD} states which are various shades of
defunct events:

 - EXIT means task that the even was assigned to died, but child events
   still live, and further children can still be created. But the event
   itself will never be active again. It can only transition to
   {REVOKED,DEAD};

 - REVOKED means the PMU the event was associated with is gone; all
   functionality is stopped but the event is still alive. Can only
   transition to DEAD;

 - DEAD event really is DYING tearing down state and freeing bits.

And now we have the sitation that __perf_remove_from_context() can do:

  {ACTIVE,INACTIVE,OFF,ERROR} -> {OFF,EXIT,REVOKED,DEAD}

Where the {OFF,ERROR} -> * transition already have
perf_cgroup_event_disable(), but the {ACTIVE,INACTIVE} -> * part has
not.

The condition:

  if (event->state > PERF_EVENT_STATE_OFF)

captures exactly that {ACTIVE,INACTIVE} that still needs the cgroup
disable. Every other state is already a disabled state.


Agreed?

Also, I should probably stick most of all this in a comment somewhere.