linux-kernel - Re: [PATCH RESEND v5] perf/core: Fix installing arbitrary cgroup event into cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180312122437.GE4043@hirez.programming.kicks-ass.net>
Date:   Mon, 12 Mar 2018 13:24:37 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Lin Xiulei <linxiulei@...il.com>
Cc:     Jiri Olsa <jolsa@...hat.com>, mingo@...hat.com, acme@...nel.org,
        alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, Stephane Eranian <eranian@...il.com>,
        torvalds@...ux-foundation.org, linux-perf-users@...r.kernel.org,
        Brendan Gregg <brendan.d.gregg@...il.com>,
        yang_oliver@...mail.com, "leilei.lin" <leilei.lin@...baba-inc.com>
Subject: Re: [PATCH RESEND v5] perf/core: Fix installing arbitrary cgroup
 event into cpu

On Wed, Mar 07, 2018 at 07:19:15PM +0800, Lin Xiulei wrote:

> >> +     /*
> >> +      * if only the cgroup is running on this cpu
> >> +      * and cpuctx->cgrp == NULL (otherwise it would've
> >> +      * been set with running cgroup), we put this cgroup
> >> +      * into cpu context. Or it would case mismatch in
> >> +      * following cgroup events at event_filter_match()
> >> +      */
> >
> > This is utterly incomprehensible, what?
> 
> Yes, this is bit messy. I should've made it clear. This comment was supposed
> to explain the reason why I modified the if statement below.
> 
> And the logic is
> 
> 1) when cpuctx-> cgrp is NULL, we __must__ take care of how to set it
> appropriately, that means, we __have to__ check if the cgroup is running
> on the cpu

Yes, however, the changelog needs to explain why this was
changed, and the above does not explain where the old code was wrong.

In what case do we fail to do 1 with the current code?

The existing nr_cgroups logic ensures we only continue for the
first/last cgroup event for that context. Is the problem that if the
first cgroup is _not_ of the current cgroup and we correctly do _not_
set cpuctx->cgrp, a second event that _is_ of the right cgroup will then
not have the opportunity to set cpuctx->cgrp ?

> 2) when cpuctx-> cgrp is __NOT__ NULL. It means cpuctx->cgrp had been
> set appropriately by cgroup_switch() or list_update_cgroup_event() before.
> Therefore, We do __nothing__ here

Right, but my variant doubled checked it. If its _not_ NULL it must be
the current task's cgrp, otherwise we have a problem. Same difference,
more paranoia :-)

But I suppose you're right and we can avoid a bunch of looking up if
cpuctx->cgrp is already set.


How is the below patch?

---
Subject: perf/core: Fix installing cgroup events on CPU
From: "leilei.lin" <leilei.lin@...baba-inc.com>
Date: Tue, 6 Mar 2018 17:36:37 +0800

There's two problems when installing cgroup events on CPUs: firstly
list_update_cgroup_event() only tries to set cpuctx->cgrp for the
first event, if that mismatches on @cgrp we'll not try again for later
additions.

Secondly, when we install a cgroup event into an active context, only
issue an event reprogram when the event matches the current cgroup
context. This avoids a pointless event reprogramming.

Cc: acme@...nel.org
Cc: yang_oliver@...mail.com
Cc: mingo@...hat.com
Cc: jolsa@...hat.com
Cc: eranian@...il.com
Cc: torvalds@...ux-foundation.org
Cc: tglx@...utronix.de
Cc: brendan.d.gregg@...il.com
Cc: alexander.shishkin@...ux.intel.com
Signed-off-by: leilei.lin <leilei.lin@...baba-inc.com>
[peterz: Changelog and comments]
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Link: http://lkml.kernel.org/r/20180306093637.28247-1-linxiulei@gmail.com
---
 kernel/events/core.c |   46 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 11 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -937,27 +937,39 @@ list_update_cgroup_event(struct perf_eve
 	if (!is_cgroup_event(event))
 		return;
 
-	if (add && ctx->nr_cgroups++)
-		return;
-	else if (!add && --ctx->nr_cgroups)
-		return;
 	/*
 	 * Because cgroup events are always per-cpu events,
 	 * this will always be called from the right CPU.
 	 */
 	cpuctx = __get_cpu_context(ctx);
-	cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
-	/* cpuctx->cgrp is NULL unless a cgroup event is active in this CPU .*/
-	if (add) {
+
+	/*
+	 * Since setting cpuctx->cgrp is conditional on the current @cgrp
+	 * matching the event's cgroup, we must do this for every new event,
+	 * because if the first would mismatch, the second would not try again
+	 * and we would leave cpuctx->cgrp unset.
+	 */
+	if (add && !cpuctx->cgrp) {
 		struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
 
-		list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
 		if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup))
 			cpuctx->cgrp = cgrp;
-	} else {
-		list_del(cpuctx_entry);
-		cpuctx->cgrp = NULL;
 	}
+
+	if (add && ctx->nr_cgroups++)
+		return;
+	else if (!add && --ctx->nr_cgroups)
+		return;
+
+	/* no cgroup running */
+	if (!add)
+		cpuctx->cgrp = NULL;
+
+	cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
+	if (add)
+		list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
+	else
+		list_del(cpuctx_entry);
 }
 
 #else /* !CONFIG_CGROUP_PERF */
@@ -2491,6 +2503,18 @@ static int  __perf_install_in_context(vo
 		raw_spin_lock(&task_ctx->lock);
 	}
 
+#ifdef CONFIG_CGROUP_PERF
+	if (is_cgroup_event(event)) {
+		/*
+		 * If the current cgroup doesn't match the event's
+		 * cgroup, we should not try to schedule it.
+		 */
+		struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
+		reprogram = cgroup_is_descendant(cgrp->css.cgroup,
+					event->cgrp->css.cgroup);
+	}
+#endif
+
 	if (reprogram) {
 		ctx_sched_out(ctx, cpuctx, EVENT_TIME);
 		add_event_to_ctx(event, ctx);