lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250502102918.GW4198@noisy.programming.kicks-ass.net>
Date: Fri, 2 May 2025 12:29:18 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>, LKML <linux-kernel@...r.kernel.org>,
	"Liang, Kan" <kan.liang@...ux.intel.com>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Ian Rogers <irogers@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Namhyung Kim <namhyung@...nel.org>,
	Ravi Bangoria <ravi.bangoria@....com>,
	linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 2/4] perf: Fix irq work dereferencing garbage

On Mon, Apr 28, 2025 at 01:11:47PM +0200, Frederic Weisbecker wrote:
> Le Thu, Apr 24, 2025 at 06:30:24PM +0200, Peter Zijlstra a écrit :
> > On Thu, Apr 24, 2025 at 06:11:26PM +0200, Frederic Weisbecker wrote:
> > > @@ -13940,29 +13941,36 @@ perf_event_exit_event(struct perf_event *event,
> > >  		 * Do destroy all inherited groups, we don't care about those
> > >  		 * and being thorough is better.
> > >  		 */
> > > -		detach_flags |= DETACH_GROUP | DETACH_CHILD;
> > > +		prd.detach_flags |= DETACH_GROUP | DETACH_CHILD;
> > >  		mutex_lock(&parent_event->child_mutex);
> > >  	}
> > >  
> > >  	if (revoke)
> > > -		detach_flags |= DETACH_GROUP | DETACH_REVOKE;
> > > +		prd.detach_flags |= DETACH_GROUP | DETACH_REVOKE;
> > >  
> > > -	perf_remove_from_context(event, detach_flags);
> > > +	perf_remove_from_context(event, &prd);
> > 
> > Isn't all this waay to complicated?
> > 
> > That is, to modify state we need both ctx->mutex and ctx->lock, and this
> > is what __perf_remove_from_context() has, but because of this, holding
> > either one of those locks is sufficient to read the state -- it cannot
> > change.
> > 
> > And here we already hold ctx->mutex.
> > 
> > So can't we simply do:
> > 
> > 	old_state = event->attach_state;
> > 	perf_remove_from_context(event, detach_flags);
> > 
> > 	// do whatever with old_state
> 
> Right, the locking scenario is just a bit more complicated.
> Most flags are set on init or with both ctx mutex and lock.
> But:
> 
> _ PERF_ATTACH_CHILD is set instead with parent child_mutex and ctx lock.

Looks trivial to add ctx->mutex to the mix here. Its not like that's a
fast path. But let me go read your patch before deciding if that's
actually needed :-)

> _ PERF_ATTACH_ITRACE is set from pmu::start(). Thus from the event context
>   with just interrupt disabled. It's probably enough to synchronize against
>   initialization and remove_from_context IPIs but perf_event_exit_event() needs
>   some care.

Right, that's a little tricky indeed. As stated, we don't care about the
bit, but the write shouldn't mess things up.

> So we must hold both ctx mutex and child_mutex (although the pmus_srcu thing
> should make that PERF_ATTACH_CHILD thing visible but let's keep things obvious).
> And also have WRITE_ONCE() / READ_ONCE() to take care about PERF_ATTACH_ITRACE,
> which we don't care about anyway.
> 
> Now this looks like this:
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 7bcb02ffb93a..7278ca731a55 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -208,7 +208,6 @@ static void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
>  }
>  
>  #define TASK_TOMBSTONE ((void *)-1L)
> -#define EVENT_TOMBSTONE ((void *)-1L)
>  
>  static bool is_kernel_event(struct perf_event *event)
>  {
> @@ -2338,12 +2337,6 @@ static void perf_child_detach(struct perf_event *event)
>  
>  	sync_child_event(event);
>  	list_del_init(&event->child_list);
> -	/*
> -	 * Cannot set to NULL, as that would confuse the situation vs
> -	 * not being a child event. See for example unaccount_event().
> -	 */
> -	event->parent = EVENT_TOMBSTONE;
> -	put_event(parent_event);
>  }
>  
>  static bool is_orphaned_event(struct perf_event *event)
> @@ -5705,7 +5698,7 @@ static void put_event(struct perf_event *event)
>  	_free_event(event);
>  
>  	/* Matches the refcount bump in inherit_event() */
> -	if (parent && parent != EVENT_TOMBSTONE)
> +	if (parent)
>  		put_event(parent);
>  }
>  
> @@ -9998,7 +9991,7 @@ void perf_event_text_poke(const void *addr, const void *old_bytes,
>  
>  void perf_event_itrace_started(struct perf_event *event)
>  {
> -	event->attach_state |= PERF_ATTACH_ITRACE;
> +	WRITE_ONCE(event->attach_state, event->attach_state | PERF_ATTACH_ITRACE);
>  }
>  
>  static void perf_log_itrace_start(struct perf_event *event)
> @@ -13922,10 +13915,7 @@ perf_event_exit_event(struct perf_event *event,
>  {
>  	struct perf_event *parent_event = event->parent;
>  	unsigned long detach_flags = DETACH_EXIT;
> -	bool is_child = !!parent_event;
> -
> -	if (parent_event == EVENT_TOMBSTONE)
> -		parent_event = NULL;
> +	unsigned int attach_state;
>  
>  	if (parent_event) {
>  		/*
> @@ -13942,6 +13932,8 @@ perf_event_exit_event(struct perf_event *event,
>  		 */
>  		detach_flags |= DETACH_GROUP | DETACH_CHILD;
>  		mutex_lock(&parent_event->child_mutex);
> +		/* PERF_ATTACH_ITRACE might be set concurrently */
> +		attach_state = READ_ONCE(event->attach_state);
>  	}
>  
>  	if (revoke)
> @@ -13951,18 +13943,25 @@ perf_event_exit_event(struct perf_event *event,
>  	/*
>  	 * Child events can be freed.
>  	 */
> -	if (is_child) {
> -		if (parent_event) {
> -			mutex_unlock(&parent_event->child_mutex);
> -			/*
> -			 * Kick perf_poll() for is_event_hup();
> -			 */
> -			perf_event_wakeup(parent_event);
> +	if (parent_event) {
> +		mutex_unlock(&parent_event->child_mutex);
> +		/*
> +		 * Kick perf_poll() for is_event_hup();
> +		 */
> +		perf_event_wakeup(parent_event);

Should not this perf_event_wakeup() be inside the next if() as well?
doing anything on parent_event when !ATTACH_CHILD seems dodgy.

> +
> +		/*
> +		 * Match the refcount initialization. Make sure it doesn't happen
> +		 * twice if pmu_detach_event() calls it on an already exited task.
> +		 */
> +		if (attach_state & PERF_ATTACH_CHILD) {
>  			/*
>  			 * pmu_detach_event() will have an extra refcount.
> +			 * perf_pending_task() might have one too.
>  			 */
>  			put_event(event);
>  		}
> +
>  		return;
>  	}

This is a *much* saner patch, thank you!

So the thing I worried about... which is why I chose for the TOMBSTONE
thing, is that this second invocation will now dereference parent_event,
even though we've already released our reference count on it. 

This is essentially a use-after-free.

The thing that makes it work is RCU. And I think we're good, since the
fail case is two perf_event_exit_event() invocations on the same event,
separated by an RCU grace period, and I don't think this can happen.

But its a shame we can't reliably detect that.. Oh well.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ