lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211211000652.1836690-1-namhyung@kernel.org>
Date:   Fri, 10 Dec 2021 16:06:52 -0800
From:   Namhyung Kim <namhyung@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Ian Rogers <irogers@...gle.com>, Marco Elver <elver@...gle.com>
Subject: [PATCH] perf/core: Fix cgroup event list management

The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
This list is accessed from current cpu and not protected by any locks.
But from the commit ef54c1a476ae ("perf: Rework
perf_event_exit_event()"), this assumption does not hold true anymore.

In the perf_remove_from_context(), it can remove an event from the
context without an IPI when the context is not active.  I think it
assumes task event context, but it's possible for cpu event context
only with cgroup events can be inactive at the moment - and it might
become active soon.

If the event is enabled when it's about to be closed, it might call
perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list
on a different cpu.

This resulted in a crash due to an invalid list pointer access during
the cgroup list traversal on the cpu which the event belongs to.

The following program can crash my box easily..

  #include <stdio.h>
  #include <fcntl.h>
  #include <unistd.h>
  #include <linux/perf_event.h>
  #include <sys/stat.h>
  #include <sys/syscall.h>

  //#define CGROUP_ROOT  "/dev/cgroup/devices"
  #define CGROUP_ROOT  "/sys/fs/cgroup"

  int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
                      int grp, unsigned long flags)
  {
    return syscall(SYS_perf_event_open, attr, pid, cpu, grp, flags);
  }

  int get_cgroup_fd(const char *grp)
  {
    char buf[128];

    snprintf(buf, sizeof(buf), "%s/%s", CGROUP_ROOT, grp);

    /* ignore failures */
    mkdir(buf, 0755);

    return open(buf, O_RDONLY);
  }

  int main(int argc, char *argv[])
  {
    struct perf_event_attr hw = {
      .type = PERF_TYPE_HARDWARE,
      .config = PERF_COUNT_HW_CPU_CYCLES,
    };
    struct perf_event_attr sw = {
      .type = PERF_TYPE_SOFTWARE,
      .config = PERF_COUNT_SW_CPU_CLOCK,
    };
    int cpus = sysconf(_SC_NPROCESSORS_ONLN);
    int fd[4][cpus];
    int cgrpA, cgrpB;

    cgrpA = get_cgroup_fd("A");
    cgrpB = get_cgroup_fd("B");
    if (cgrpA < 0 || cgrpB < 0) {
      printf("failed to get cgroup fd\n");
      return 1;
    }

    while (1) {
      int i;

      for (i = 0; i < cpus; i++) {
        fd[0][i] = perf_event_open(&hw, cgrpA, i, -1, PERF_FLAG_PID_CGROUP);
        fd[1][i] = perf_event_open(&sw, cgrpA, i, -1, PERF_FLAG_PID_CGROUP);
        fd[2][i] = perf_event_open(&hw, cgrpB, i, -1, PERF_FLAG_PID_CGROUP);
        fd[3][i] = perf_event_open(&sw, cgrpB, i, -1, PERF_FLAG_PID_CGROUP);
      }

      for (i = 0; i < cpus; i++) {
        close(fd[3][i]);
        close(fd[2][i]);
        close(fd[1][i]);
        close(fd[0][i]);
      }
    }
    return 0;
  }

Let's use IPI to prevent such crashes.

Similarly, I think perf_install_in_context() should use IPI for the
first cgroup event at least.

Cc: Marco Elver <elver@...gle.com>
Signed-off-by: Namhyung Kim <namhyung@...nel.org>
---
 kernel/events/core.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 30d94f68c5bd..8ebb41ab2089 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2388,7 +2388,7 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
 	 * event_function_call() user.
 	 */
 	raw_spin_lock_irq(&ctx->lock);
-	if (!ctx->is_active) {
+	if (!ctx->is_active && (!is_cgroup_event(event) || ctx->nr_cgroups > 1)) {
 		__perf_remove_from_context(event, __get_cpu_context(ctx),
 					   ctx, (void *)flags);
 		raw_spin_unlock_irq(&ctx->lock);
@@ -2857,11 +2857,14 @@ perf_install_in_context(struct perf_event_context *ctx,
 	 * perf_event_attr::disabled events will not run and can be initialized
 	 * without IPI. Except when this is the first event for the context, in
 	 * that case we need the magic of the IPI to set ctx->is_active.
+	 * Similarly, the first cgroup event for the context also needs the IPI
+	 * to manipulate the cgrp_cpuctx_list.
 	 *
 	 * The IOC_ENABLE that is sure to follow the creation of a disabled
 	 * event will issue the IPI and reprogram the hardware.
 	 */
-	if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
+	if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
+	    ctx->nr_events && (ctx->nr_cgroups || !is_cgroup_event(event))) {
 		raw_spin_lock_irq(&ctx->lock);
 		if (ctx->task == TASK_TOMBSTONE) {
 			raw_spin_unlock_irq(&ctx->lock);
-- 
2.34.1.173.g76aa8bc2d0-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ