linux-kernel - [PATCH v2] perf/core: Fix cgroup event list management

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211213065936.1965081-1-namhyung@kernel.org>
Date:   Sun, 12 Dec 2021 22:59:36 -0800
From:   Namhyung Kim <namhyung@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Ian Rogers <irogers@...gle.com>,
        kernel test robot <lkp@...el.com>,
        Marco Elver <elver@...gle.com>
Subject: [PATCH v2] perf/core: Fix cgroup event list management

The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
This list is accessed from current cpu and not protected by any locks.
But from the commit ef54c1a476ae ("perf: Rework
perf_event_exit_event()"), this assumption does not hold true anymore.

In the perf_remove_from_context(), it can remove an event from the
context without an IPI when the context is not active.  I think it
assumes task event context, but it's possible for cpu event context
only with cgroup events can be inactive at the moment - and it might
become active soon.

If the event is enabled when it's about to be closed, it might call
perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list
on a different cpu.

This resulted in a crash due to an invalid list pointer access during
the cgroup list traversal on the cpu which the event belongs to.

The following program can crash my box easily..

  #include <stdio.h>
  #include <fcntl.h>
  #include <unistd.h>
  #include <linux/perf_event.h>
  #include <sys/stat.h>
  #include <sys/syscall.h>

  //#define CGROUP_ROOT  "/dev/cgroup/devices"
  #define CGROUP_ROOT  "/sys/fs/cgroup"

  int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
                      int grp, unsigned long flags)
  {
    return syscall(SYS_perf_event_open, attr, pid, cpu, grp, flags);
  }

  int get_cgroup_fd(const char *grp)
  {
    char buf[128];

    snprintf(buf, sizeof(buf), "%s/%s", CGROUP_ROOT, grp);

    /* ignore failures */
    mkdir(buf, 0755);

    return open(buf, O_RDONLY);
  }

  int main(int argc, char *argv[])
  {
    struct perf_event_attr hw = {
      .type = PERF_TYPE_HARDWARE,
      .config = PERF_COUNT_HW_CPU_CYCLES,
    };
    struct perf_event_attr sw = {
      .type = PERF_TYPE_SOFTWARE,
      .config = PERF_COUNT_SW_CPU_CLOCK,
    };
    int cpus = sysconf(_SC_NPROCESSORS_ONLN);
    int fd[4][cpus];
    int cgrpA, cgrpB;

    cgrpA = get_cgroup_fd("A");
    cgrpB = get_cgroup_fd("B");
    if (cgrpA < 0 || cgrpB < 0) {
      printf("failed to get cgroup fd\n");
      return 1;
    }

    while (1) {
      int i;

      for (i = 0; i < cpus; i++) {
        fd[0][i] = perf_event_open(&hw, cgrpA, i, -1, PERF_FLAG_PID_CGROUP);
        fd[1][i] = perf_event_open(&sw, cgrpA, i, -1, PERF_FLAG_PID_CGROUP);
        fd[2][i] = perf_event_open(&hw, cgrpB, i, -1, PERF_FLAG_PID_CGROUP);
        fd[3][i] = perf_event_open(&sw, cgrpB, i, -1, PERF_FLAG_PID_CGROUP);
      }

      for (i = 0; i < cpus; i++) {
        close(fd[3][i]);
        close(fd[2][i]);
        close(fd[1][i]);
        close(fd[0][i]);
      }
    }
    return 0;
  }

Let's use IPI to prevent such crashes.

Similarly, I think perf_install_in_context() should use IPI for the
cgroup events too.

Reported-by: kernel test robot <lkp@...el.com>  # for build error
Cc: Marco Elver <elver@...gle.com>
Signed-off-by: Namhyung Kim <namhyung@...nel.org>
---
v2) simply use IPI for cgroup events

 kernel/events/core.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 30d94f68c5bd..9460c083acd9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2388,7 +2388,7 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
 	 * event_function_call() user.
 	 */
 	raw_spin_lock_irq(&ctx->lock);
-	if (!ctx->is_active) {
+	if (!ctx->is_active && !is_cgroup_event(event)) {
 		__perf_remove_from_context(event, __get_cpu_context(ctx),
 					   ctx, (void *)flags);
 		raw_spin_unlock_irq(&ctx->lock);
@@ -2857,11 +2857,14 @@ perf_install_in_context(struct perf_event_context *ctx,
 	 * perf_event_attr::disabled events will not run and can be initialized
 	 * without IPI. Except when this is the first event for the context, in
 	 * that case we need the magic of the IPI to set ctx->is_active.
+	 * Similarly, cgroup events for the context also needs the IPI to
+	 * manipulate the cgrp_cpuctx_list.
 	 *
 	 * The IOC_ENABLE that is sure to follow the creation of a disabled
 	 * event will issue the IPI and reprogram the hardware.
 	 */
-	if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
+	if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
+	    ctx->nr_events && !is_cgroup_event(event)) {
 		raw_spin_lock_irq(&ctx->lock);
 		if (ctx->task == TASK_TOMBSTONE) {
 			raw_spin_unlock_irq(&ctx->lock);

base-commit: 73743c3b092277febbf69b250ce8ebbca0525aa2
-- 
2.34.1.173.g76aa8bc2d0-goog