linux-kernel - Re: [PATCH v2] perf/core: Optimize event reschedule for a PMU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240806075630.GL37996@noisy.programming.kicks-ass.net>
Date: Tue, 6 Aug 2024 09:56:30 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Namhyung Kim <namhyung@...nel.org>
Cc: "Liang, Kan" <kan.liang@...ux.intel.com>,
	Ingo Molnar <mingo@...nel.org>, Mark Rutland <mark.rutland@....com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Ravi Bangoria <ravi.bangoria@....com>,
	Stephane Eranian <eranian@...gle.com>,
	Ian Rogers <irogers@...gle.com>, Mingwei Zhang <mizhang@...gle.com>
Subject: Re: [PATCH v2] perf/core: Optimize event reschedule for a PMU

On Mon, Aug 05, 2024 at 11:19:48PM -0700, Namhyung Kim wrote:
> On Mon, Aug 5, 2024 at 7:58 AM Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > On Mon, Aug 05, 2024 at 11:20:58AM +0200, Peter Zijlstra wrote:
> > > On Fri, Aug 02, 2024 at 02:30:19PM -0400, Liang, Kan wrote:
> > > > > @@ -2792,7 +2833,14 @@ static int  __perf_install_in_context(void *info)
> > > > >   if (reprogram) {
> > > > >           ctx_sched_out(ctx, EVENT_TIME);
> > > > >           add_event_to_ctx(event, ctx);
> > > > > -         ctx_resched(cpuctx, task_ctx, get_event_type(event));
> > > > > +         if (ctx->nr_events == 1) {
> > > > > +                 /* The first event needs to set ctx->is_active. */
> > > > > +                 ctx_resched(cpuctx, task_ctx, NULL, get_event_type(event));
> > > > > +         } else {
> > > > > +                 ctx_resched(cpuctx, task_ctx, event->pmu_ctx->pmu,
> > > > > +                             get_event_type(event));
> > > > > +                 ctx_sched_in(ctx, EVENT_TIME);
> > > >
> > > > The changelog doesn't mention the time difference much. As my
> > > > understanding, the time is shared among PMUs in the same ctx.
> > > > When perf does ctx_resched(), the time is deducted.
> > > > There is no problem to stop and restart the global time when perf
> > > > re-schedule all PMUs.
> > > > But if only one PMU is re-scheduled while others are still running, it
> > > > may be a problem to stop and restart the global time. Other PMUs will be
> > > > impacted.
> > >
> > > So afaict, since we hold ctx->lock, nobody can observe EVENT_TIME was
> > > cleared for a little while.
> > >
> > > So the point was to make all the various ctx_sched_out() calls have the
> > > same timestamp. It does this by clearing EVENT_TIME first. Then the
> > > first ctx_sched_in() will set it again, and later ctx_sched_in() won't
> > > touch time.
> > >
> > > That leaves a little hole, because the time between
> > > ctx_sched_out(EVENT_TIME) and the first ctx_sched_in() gets lost.
> > >
> > > This isn't typically a problem, but not very nice. Let me go find an
> > > alternative solution for this. The simple update I did saturday is
> > > broken as per the perf test.
> >
> > OK, took a little longer than I would have liked, nor is it entirely
> > pretty, but it seems to pass 'perf test'.
> >
> > Please look at: queue.git perf/resched
> >
> > I'll try and post it all tomorrow.
> 
> Thanks for doing this.  But some of my tests are still failing.
> I'm seeing some system-wide events are not counted.
> Let me take a deeper look at it.

Does this help? What would be an easy reproducer?

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c67fc43fe877..4a04611333d9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -179,23 +179,27 @@ static void perf_ctx_lock(struct perf_cpu_context *cpuctx,
 	}
 }
 
+static inline void __perf_ctx_unlock(struct perf_event_context *ctx)
+{
+	/*
+	 * If ctx_sched_in() didn't again set any ALL flags, clean up
+	 * after ctx_sched_out() by clearing is_active.
+	 */
+	if (ctx->is_active & EVENT_FROZEN) {
+		if (!(ctx->is_active & EVENT_ALL))
+			ctx->is_active = 0;
+		else
+			ctx->is_active &= ~EVENT_FROZEN;
+	}
+	raw_spin_unlock(&ctx->lock);
+}
+
 static void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
 			    struct perf_event_context *ctx)
 {
-	if (ctx) {
-		/*
-		 * If ctx_sched_in() didn't again set any ALL flags, clean up
-		 * after ctx_sched_out() by clearing is_active.
-		 */
-		if (ctx->is_active & EVENT_FROZEN) {
-			if (!(ctx->is_active & EVENT_ALL))
-				ctx->is_active = 0;
-			else
-				ctx->is_active &= ~EVENT_FROZEN;
-		}
-		raw_spin_unlock(&ctx->lock);
-	}
-	raw_spin_unlock(&cpuctx->ctx.lock);
+	if (ctx)
+		__perf_ctx_unlock(ctx);
+	__perf_ctx_unlock(&cpuctx->ctx.lock);
 }
 
 #define TASK_TOMBSTONE ((void *)-1L)