lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aA9ic6m6WAcmVBAw@pavilion.home>
Date: Mon, 28 Apr 2025 13:11:47 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, LKML <linux-kernel@...r.kernel.org>,
	"Liang, Kan" <kan.liang@...ux.intel.com>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Ian Rogers <irogers@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Namhyung Kim <namhyung@...nel.org>,
	Ravi Bangoria <ravi.bangoria@....com>,
	linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 2/4] perf: Fix irq work dereferencing garbage

Le Thu, Apr 24, 2025 at 06:30:24PM +0200, Peter Zijlstra a écrit :
> On Thu, Apr 24, 2025 at 06:11:26PM +0200, Frederic Weisbecker wrote:
> > @@ -13940,29 +13941,36 @@ perf_event_exit_event(struct perf_event *event,
> >  		 * Do destroy all inherited groups, we don't care about those
> >  		 * and being thorough is better.
> >  		 */
> > -		detach_flags |= DETACH_GROUP | DETACH_CHILD;
> > +		prd.detach_flags |= DETACH_GROUP | DETACH_CHILD;
> >  		mutex_lock(&parent_event->child_mutex);
> >  	}
> >  
> >  	if (revoke)
> > -		detach_flags |= DETACH_GROUP | DETACH_REVOKE;
> > +		prd.detach_flags |= DETACH_GROUP | DETACH_REVOKE;
> >  
> > -	perf_remove_from_context(event, detach_flags);
> > +	perf_remove_from_context(event, &prd);
> 
> Isn't all this waay to complicated?
> 
> That is, to modify state we need both ctx->mutex and ctx->lock, and this
> is what __perf_remove_from_context() has, but because of this, holding
> either one of those locks is sufficient to read the state -- it cannot
> change.
> 
> And here we already hold ctx->mutex.
> 
> So can't we simply do:
> 
> 	old_state = event->attach_state;
> 	perf_remove_from_context(event, detach_flags);
> 
> 	// do whatever with old_state

Right, the locking scenario is just a bit more complicated.
Most flags are set on init or with both ctx mutex and lock.
But:

_ PERF_ATTACH_CHILD is set instead with parent child_mutex and ctx lock.
_ PERF_ATTACH_ITRACE is set from pmu::start(). Thus from the event context
  with just interrupt disabled. It's probably enough to synchronize against
  initialization and remove_from_context IPIs but perf_event_exit_event() needs
  some care.

So we must hold both ctx mutex and child_mutex (although the pmus_srcu thing
should make that PERF_ATTACH_CHILD thing visible but let's keep things obvious).
And also have WRITE_ONCE() / READ_ONCE() to take care about PERF_ATTACH_ITRACE,
which we don't care about anyway.

Now this looks like this:

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7bcb02ffb93a..7278ca731a55 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -208,7 +208,6 @@ static void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
 }
 
 #define TASK_TOMBSTONE ((void *)-1L)
-#define EVENT_TOMBSTONE ((void *)-1L)
 
 static bool is_kernel_event(struct perf_event *event)
 {
@@ -2338,12 +2337,6 @@ static void perf_child_detach(struct perf_event *event)
 
 	sync_child_event(event);
 	list_del_init(&event->child_list);
-	/*
-	 * Cannot set to NULL, as that would confuse the situation vs
-	 * not being a child event. See for example unaccount_event().
-	 */
-	event->parent = EVENT_TOMBSTONE;
-	put_event(parent_event);
 }
 
 static bool is_orphaned_event(struct perf_event *event)
@@ -5705,7 +5698,7 @@ static void put_event(struct perf_event *event)
 	_free_event(event);
 
 	/* Matches the refcount bump in inherit_event() */
-	if (parent && parent != EVENT_TOMBSTONE)
+	if (parent)
 		put_event(parent);
 }
 
@@ -9998,7 +9991,7 @@ void perf_event_text_poke(const void *addr, const void *old_bytes,
 
 void perf_event_itrace_started(struct perf_event *event)
 {
-	event->attach_state |= PERF_ATTACH_ITRACE;
+	WRITE_ONCE(event->attach_state, event->attach_state | PERF_ATTACH_ITRACE);
 }
 
 static void perf_log_itrace_start(struct perf_event *event)
@@ -13922,10 +13915,7 @@ perf_event_exit_event(struct perf_event *event,
 {
 	struct perf_event *parent_event = event->parent;
 	unsigned long detach_flags = DETACH_EXIT;
-	bool is_child = !!parent_event;
-
-	if (parent_event == EVENT_TOMBSTONE)
-		parent_event = NULL;
+	unsigned int attach_state;
 
 	if (parent_event) {
 		/*
@@ -13942,6 +13932,8 @@ perf_event_exit_event(struct perf_event *event,
 		 */
 		detach_flags |= DETACH_GROUP | DETACH_CHILD;
 		mutex_lock(&parent_event->child_mutex);
+		/* PERF_ATTACH_ITRACE might be set concurrently */
+		attach_state = READ_ONCE(event->attach_state);
 	}
 
 	if (revoke)
@@ -13951,18 +13943,25 @@ perf_event_exit_event(struct perf_event *event,
 	/*
 	 * Child events can be freed.
 	 */
-	if (is_child) {
-		if (parent_event) {
-			mutex_unlock(&parent_event->child_mutex);
-			/*
-			 * Kick perf_poll() for is_event_hup();
-			 */
-			perf_event_wakeup(parent_event);
+	if (parent_event) {
+		mutex_unlock(&parent_event->child_mutex);
+		/*
+		 * Kick perf_poll() for is_event_hup();
+		 */
+		perf_event_wakeup(parent_event);
+
+		/*
+		 * Match the refcount initialization. Make sure it doesn't happen
+		 * twice if pmu_detach_event() calls it on an already exited task.
+		 */
+		if (attach_state & PERF_ATTACH_CHILD) {
 			/*
 			 * pmu_detach_event() will have an extra refcount.
+			 * perf_pending_task() might have one too.
 			 */
 			put_event(event);
 		}
+
 		return;
 	}
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ