[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211211000652.1836690-1-namhyung@kernel.org>
Date: Fri, 10 Dec 2021 16:06:52 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Jiri Olsa <jolsa@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>,
Stephane Eranian <eranian@...gle.com>,
Andi Kleen <ak@...ux.intel.com>,
Ian Rogers <irogers@...gle.com>, Marco Elver <elver@...gle.com>
Subject: [PATCH] perf/core: Fix cgroup event list management
The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
This list is accessed from current cpu and not protected by any locks.
But from the commit ef54c1a476ae ("perf: Rework
perf_event_exit_event()"), this assumption does not hold true anymore.
In the perf_remove_from_context(), it can remove an event from the
context without an IPI when the context is not active. I think it
assumes task event context, but it's possible for cpu event context
only with cgroup events can be inactive at the moment - and it might
become active soon.
If the event is enabled when it's about to be closed, it might call
perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list
on a different cpu.
This resulted in a crash due to an invalid list pointer access during
the cgroup list traversal on the cpu which the event belongs to.
The following program can crash my box easily..
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <linux/perf_event.h>
#include <sys/stat.h>
#include <sys/syscall.h>
//#define CGROUP_ROOT "/dev/cgroup/devices"
#define CGROUP_ROOT "/sys/fs/cgroup"
int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
int grp, unsigned long flags)
{
return syscall(SYS_perf_event_open, attr, pid, cpu, grp, flags);
}
int get_cgroup_fd(const char *grp)
{
char buf[128];
snprintf(buf, sizeof(buf), "%s/%s", CGROUP_ROOT, grp);
/* ignore failures */
mkdir(buf, 0755);
return open(buf, O_RDONLY);
}
int main(int argc, char *argv[])
{
struct perf_event_attr hw = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
};
struct perf_event_attr sw = {
.type = PERF_TYPE_SOFTWARE,
.config = PERF_COUNT_SW_CPU_CLOCK,
};
int cpus = sysconf(_SC_NPROCESSORS_ONLN);
int fd[4][cpus];
int cgrpA, cgrpB;
cgrpA = get_cgroup_fd("A");
cgrpB = get_cgroup_fd("B");
if (cgrpA < 0 || cgrpB < 0) {
printf("failed to get cgroup fd\n");
return 1;
}
while (1) {
int i;
for (i = 0; i < cpus; i++) {
fd[0][i] = perf_event_open(&hw, cgrpA, i, -1, PERF_FLAG_PID_CGROUP);
fd[1][i] = perf_event_open(&sw, cgrpA, i, -1, PERF_FLAG_PID_CGROUP);
fd[2][i] = perf_event_open(&hw, cgrpB, i, -1, PERF_FLAG_PID_CGROUP);
fd[3][i] = perf_event_open(&sw, cgrpB, i, -1, PERF_FLAG_PID_CGROUP);
}
for (i = 0; i < cpus; i++) {
close(fd[3][i]);
close(fd[2][i]);
close(fd[1][i]);
close(fd[0][i]);
}
}
return 0;
}
Let's use IPI to prevent such crashes.
Similarly, I think perf_install_in_context() should use IPI for the
first cgroup event at least.
Cc: Marco Elver <elver@...gle.com>
Signed-off-by: Namhyung Kim <namhyung@...nel.org>
---
kernel/events/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 30d94f68c5bd..8ebb41ab2089 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2388,7 +2388,7 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
* event_function_call() user.
*/
raw_spin_lock_irq(&ctx->lock);
- if (!ctx->is_active) {
+ if (!ctx->is_active && (!is_cgroup_event(event) || ctx->nr_cgroups > 1)) {
__perf_remove_from_context(event, __get_cpu_context(ctx),
ctx, (void *)flags);
raw_spin_unlock_irq(&ctx->lock);
@@ -2857,11 +2857,14 @@ perf_install_in_context(struct perf_event_context *ctx,
* perf_event_attr::disabled events will not run and can be initialized
* without IPI. Except when this is the first event for the context, in
* that case we need the magic of the IPI to set ctx->is_active.
+ * Similarly, the first cgroup event for the context also needs the IPI
+ * to manipulate the cgrp_cpuctx_list.
*
* The IOC_ENABLE that is sure to follow the creation of a disabled
* event will issue the IPI and reprogram the hardware.
*/
- if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
+ if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
+ ctx->nr_events && (ctx->nr_cgroups || !is_cgroup_event(event))) {
raw_spin_lock_irq(&ctx->lock);
if (ctx->task == TASK_TOMBSTONE) {
raw_spin_unlock_irq(&ctx->lock);
--
2.34.1.173.g76aa8bc2d0-goog
Powered by blists - more mailing lists