[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20151026202009.GX2508@worktop.programming.kicks-ass.net>
Date: Mon, 26 Oct 2015 21:20:09 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Stephane Eranian <eranian@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
"mingo@...e.hu" <mingo@...e.hu>,
"ak@...ux.intel.com" <ak@...ux.intel.com>,
Eric Dumazet <edumazet@...gle.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>
Subject: Re: [PATCH] perf/core: fix RCU issues with cgroup monitoring mode
On Mon, Oct 26, 2015 at 12:27:05PM -0700, Stephane Eranian wrote:
> Ok, that one was a bad example because yes, it grabs the ctx lock and the
> rcu_lock() already.
>
> But the other path:
>
> __perf_event_task_sched_out() -> perf_cgroup_sched_out() ->
> perf_cgroup_switch()
> is accessing in perf_cgroup_sched_out() task_subsys_state() without
> ctx->lock() or
> rcu_read lock. Same thing on the sched_in path.
Indeed, that part looks good.
Now if you'd done a patch per warning (roughly) with the splat included,
I could have applied some of it ;-)
> The one place where we already hold the ctx->lock is
> __perf_install_in_context().
> Same thing for __perf_event_read() -> update_cgrp_time_from_event().
> So yes, we'd need a way to tell this is okay, the cgroup cannot disappear.
Right, but this is 'hard' :/ Or at least, my jet-lagged brain isn't
currently seeing a sane way to do that.
perf_cgroup_from_task() only takes the task pointer, and we'd somehow
need to find a 'random' pmu cpuctx pointer for that.
Passing in a ctx pointer would be possible, but would not be too much
different from just telling that check to shut up.
Hmm, note that you even have comments in saying these are good because you've
done css_get(). So maybe just shut it up?
Something like the below; combine that with the bits of your patch that
move rcu_lock our from perf_cgroup_switch() and into
perf_cgroup_switch_{in,out}() and we should be good, right?
---
include/linux/perf_event.h | 4 ++--
kernel/events/core.c | 14 +++++++-------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d841d33bcdc9..24f3539a85ef 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -697,9 +697,9 @@ struct perf_cgroup {
* if there is no cgroup event for the current CPU context.
*/
static inline struct perf_cgroup *
-perf_cgroup_from_task(struct task_struct *task)
+perf_cgroup_from_task(struct task_struct *task, bool safe)
{
- return container_of(task_css(task, perf_event_cgrp_id),
+ return container_of(task_css_check(task, perf_event_cgrp_id, safe),
struct perf_cgroup, css);
}
#endif /* CONFIG_CGROUP_PERF */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ea02109aee77..3633630a20df 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -435,7 +435,7 @@ static inline void update_cgrp_time_from_event(struct perf_event *event)
if (!is_cgroup_event(event))
return;
- cgrp = perf_cgroup_from_task(current);
+ cgrp = perf_cgroup_from_task(current, true);
/*
* Do not update time when cgroup is not active
*/
@@ -458,7 +458,7 @@ perf_cgroup_set_timestamp(struct task_struct *task,
if (!task || !ctx->nr_cgroups)
return;
- cgrp = perf_cgroup_from_task(task);
+ cgrp = perf_cgroup_from_task(task, true);
info = this_cpu_ptr(cgrp->info);
info->timestamp = ctx->timestamp;
}
@@ -523,7 +523,7 @@ static void perf_cgroup_switch(struct task_struct *task, int mode)
* event_filter_match() to not have to pass
* task around
*/
- cpuctx->cgrp = perf_cgroup_from_task(task);
+ cpuctx->cgrp = perf_cgroup_from_task(task, false);
cpu_ctx_sched_in(cpuctx, EVENT_ALL, task);
}
perf_pmu_enable(cpuctx->ctx.pmu);
@@ -545,14 +545,14 @@ static inline void perf_cgroup_sched_out(struct task_struct *task,
/*
* we come here when we know perf_cgroup_events > 0
*/
- cgrp1 = perf_cgroup_from_task(task);
+ cgrp1 = perf_cgroup_from_task(task, false);
/*
* next is NULL when called from perf_event_enable_on_exec()
* that will systematically cause a cgroup_switch()
*/
if (next)
- cgrp2 = perf_cgroup_from_task(next);
+ cgrp2 = perf_cgroup_from_task(next, false);
/*
* only schedule out current cgroup events if we know
@@ -572,10 +572,10 @@ static inline void perf_cgroup_sched_in(struct task_struct *prev,
/*
* we come here when we know perf_cgroup_events > 0
*/
- cgrp1 = perf_cgroup_from_task(task);
+ cgrp1 = perf_cgroup_from_task(task, false);
/* prev can never be NULL */
- cgrp2 = perf_cgroup_from_task(prev);
+ cgrp2 = perf_cgroup_from_task(prev, false);
/*
* only need to schedule in cgroup events if we are changing
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists