linux-kernel - [PATCH] perf/core: set cgroup for cpu contexts for new cgroup events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1470107099-132631-1-git-send-email-davidcc@google.com>
Date:	Mon,  1 Aug 2016 20:04:59 -0700
From:	David Carrillo-Cisneros <davidcc@...gle.com>
To:	linux-kernel@...r.kernel.org
Cc:	"x86@...nel.org" <x86@...nel.org>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andi Kleen <ak@...ux.intel.com>,
	Kan Liang <kan.liang@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Vegard Nossum <vegard.nossum@...il.com>,
	Paul Turner <pjt@...gle.com>,
	Stephane Eranian <eranian@...gle.com>,
	David Carrillo-Cisneros <davidcc@...gle.com>
Subject: [PATCH] perf/core: set cgroup for cpu contexts for new cgroup events

There is an optimization in perf_cgroup_sched_{in,out} that skips the
switch of cgroup events if the old and new cgroups in a task context
switch are the same. This optimization interacts with the current code
in two ways that cause a cpu context's cgroup (cpuctx->cgrp) to be NULL
despite having a cgroup event that matches the current task. These are:

  1. On creation of the first cgroup event in a CPU: In current code,
  cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the
  aforesaid optimization, perf_cgroup_sched_in will run until the next
  cgroup switch in that cpu. This may happen late or never happen,
  depending on system's number of cgroups, cpu load, etc.

  2. On deletion of the last cgroup event in a cpuctx: In list_del_event,
  cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in
  because cpuctx->cgrp == NULL until a cgroup switch occurs and
  perf_cgroup_sched_in is executed (updating cpuctx->cgrp).

This patch fixes both problems by setting cpuctx->cgrp in list_add_event,
mirroring what list_del_event does when removing a cgroup event from CPU
context, as introduced in:
commit 68cacd29167b ("perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()")

With this patch, cpuctx->cgrp is always set/clear when installing/removing
the first/last cgroup event in/from the cpu context. Having cpuctx->cgrp
correctly set since the event is installed in the context allows
event_filter_match to work correctly while scheduling in and out events
without relying on a cgroup switch that may occur late (if ever).

The problem is easy to observe in a machine with only one cgroup:

  $ perf stat -e cycles -I 1000 -C 0 -G /
  #          time             counts unit events
      1.000161699      <not counted>      cycles                    /
      2.000355591      <not counted>      cycles                    /
      3.000565154      <not counted>      cycles                    /
      4.000951350      <not counted>      cycles                    /

After the fix, the output is as expected:

  $ perf stat -e cycles -I 1000 -a -G /
  #         time             counts unit events
     1.004699159          627342882      cycles                    /
     2.007397156          615272690      cycles                    /
     3.010019057          616726074      cycles                    /

This patch also do minor style edit into list_del_event.

Rebased at peterz/queue/perf/core.

Signed-off-by: David Carrillo-Cisneros <davidcc@...gle.com>
Reviewed-by: Stephane Eranian <eranian@...gle.com>
---
 kernel/events/core.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9345028..1efa89d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1392,6 +1392,8 @@ ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
 static void
 list_add_event(struct perf_event *event, struct perf_event_context *ctx)
 {
+	struct perf_cpu_context *cpuctx;
+
 	lockdep_assert_held(&ctx->lock);
 
 	WARN_ON_ONCE(event->attach_state & PERF_ATTACH_CONTEXT);
@@ -1412,8 +1414,21 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
 		list_add_tail(&event->group_entry, list);
 	}
 
-	if (is_cgroup_event(event))
-		ctx->nr_cgroups++;
+	if (is_cgroup_event(event)) {
+		/*
+		 * If there are no more cgroup events, set cgrp in context
+		 * so event_filter_match works.
+		 */
+		if (!ctx->nr_cgroups++) {
+			/*
+			 * Because cgroup events are always per-cpu events,
+			 * this will always be called from the right CPU.
+			 */
+			cpuctx = __get_cpu_context(ctx);
+			cpuctx->cgrp = event->cgrp;
+		}
+	}
+
 
 	list_add_rcu(&event->event_entry, &ctx->event_list);
 	ctx->nr_events++;
@@ -1595,18 +1610,18 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 	event->attach_state &= ~PERF_ATTACH_CONTEXT;
 
 	if (is_cgroup_event(event)) {
-		ctx->nr_cgroups--;
-		/*
-		 * Because cgroup events are always per-cpu events, this will
-		 * always be called from the right CPU.
-		 */
-		cpuctx = __get_cpu_context(ctx);
 		/*
 		 * If there are no more cgroup events then clear cgrp to avoid
 		 * stale pointer in update_cgrp_time_from_cpuctx().
 		 */
-		if (!ctx->nr_cgroups)
+		if (!--ctx->nr_cgroups) {
+			/*
+			 * Because cgroup events are always per-cpu events,
+			 * this will always be called from the right CPU.
+			 */
+			cpuctx = __get_cpu_context(ctx);
 			cpuctx->cgrp = NULL;
+		}
 	}
 
 	ctx->nr_events--;
-- 
2.8.0.rc3.226.g39d4020