linux-kernel - Re: [PATCH v2 1/3] perf: Reorder overflow handler ahead of event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP045ArTufm+6m9KcnuX5Doe4U6LzKYOHtEqrhvb28U3_HE_cw@mail.gmail.com>
Date:   Thu, 7 Dec 2023 15:02:01 -0800
From:   Kyle Huey <me@...ehuey.com>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Marco Elver <elver@...gle.com>, Kyle Huey <khuey@...ehuey.com>,
        open list <linux-kernel@...r.kernel.org>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Yonghong Song <yonghong.song@...ux.dev>,
        "Robert O'Callahan" <robert@...llahan.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        linux-perf-users@...r.kernel.org
Subject: Re: [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap

On Thu, Dec 7, 2023 at 2:38 PM Namhyung Kim <namhyung@...nel.org> wrote:
>
> Hello,
>
> On Thu, Dec 07, 2023 at 06:53:58PM +0100, Marco Elver wrote:
> > On Thu, 7 Dec 2023 at 18:47, Kyle Huey <me@...ehuey.com> wrote:
> > >
> > >
> > >
> > > On Thu, Dec 7, 2023, 9:05 AM Marco Elver <elver@...gle.com> wrote:
> > >>
> > >> On Thu, 7 Dec 2023 at 17:35, Kyle Huey <me@...ehuey.com> wrote:
> > >> >
> > >> > The perf subsystem already allows an overflow handler to clear pending_kill
> > >> > to suppress a fasync signal (although nobody currently does this). Allow an
> > >> > overflow handler to suppress the other visible side effects of an overflow,
> > >> > namely event_limit accounting and SIGTRAP generation.
>
> Well, I think it can still hit the throttling logic and generate
> a PERF_RECORD_THROTTLE.  But it should be rare..
>
> > >> >
> > >> > Signed-off-by: Kyle Huey <khuey@...ehuey.com>
> > >> > ---
> > >> >  kernel/events/core.c | 10 +++++++---
> > >> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > >> >
> > >> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > >> > index b704d83a28b2..19fddfc27a4a 100644
> > >> > --- a/kernel/events/core.c
> > >> > +++ b/kernel/events/core.c
> > >> > @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event,
> > >> >          */
> > >> >
> > >> >         event->pending_kill = POLL_IN;
> > >> > +
> > >> > +       READ_ONCE(event->overflow_handler)(event, data, regs);
> > >> > +
> > >> > +       if (!event->pending_kill)
> > >> > +               return ret;
> > >>
> > >> It's not at all intuitive that resetting pending_kill to 0 will
> > >> suppress everything else, too. There is no relationship between the
> > >> fasync signals and SIGTRAP. pending_kill is for the former and
> > >> pending_sigtrap is for the latter. One should not affect the other.
> > >>
> > >> A nicer solution would be to properly undo the various pending_*
> > >> states (in the case of pending_sigtrap being set it should be enough
> > >> to reset pending_sigtrap to 0, and also decrement
> > >> event->ctx->nr_pending).
> > >
> > >
> > > I don't believe it's possible to correctly undo the event_limit decrement after the fact (if it's e.g. racing with the ioctl that adds to the event limit).
> > >
> > >> Although I can see why this solution is simpler. Perhaps with enough
> > >> comments it might be clearer.
> > >>
> > >> Preferences?
> > >
> > >
> > > The cleanest way would probably be to add a return value to the overflow handler function that controls this. It requires changing a bunch of arch specific code on arches I don't have access to though.
> >
> > Hmm.
> >
> > Maybe wait for perf maintainers to say what is preferrable. (I could
> > live with just making sure this has no other weird side effects and
> > more comments.)
>
> What if we can call bpf handler directly and check the return value?
> Then I think we can also get rid of the original overflow handler.
>
> Something like this (untested..)

mmm, that's an interesting idea. I'll experiment with it.

- Kyle

> Thanks,
> Namhyung
>
>
> ---8<---
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index e85cd1c0eaf3..1eba6f5bb70b 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -809,7 +809,6 @@ struct perf_event {
>         perf_overflow_handler_t         overflow_handler;
>         void                            *overflow_handler_context;
>  #ifdef CONFIG_BPF_SYSCALL
> -       perf_overflow_handler_t         orig_overflow_handler;
>         struct bpf_prog                 *prog;
>         u64                             bpf_cookie;
>  #endif
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4c72a41f11af..e1a00646dbbe 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9471,6 +9471,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
>   * Generic event overflow handling, sampling.
>   */
>
> +#ifdef CONFIG_BPF_SYSCALL
> +static int bpf_overflow_handler(struct perf_event *event,
> +                               struct perf_sample_data *data,
> +                               struct pt_regs *regs);
> +#endif
> +
>  static int __perf_event_overflow(struct perf_event *event,
>                                  int throttle, struct perf_sample_data *data,
>                                  struct pt_regs *regs)
> @@ -9487,6 +9493,11 @@ static int __perf_event_overflow(struct perf_event *event,
>
>         ret = __perf_event_account_interrupt(event, throttle);
>
> +#ifdef CONFIG_BPF_SYSCALL
> +       if (event->prog && bpf_overflow_handler(event, data, regs) == 0)
> +               return ret;
> +#endif
> +
>         /*
>          * XXX event_limit might not quite work as expected on inherited
>          * events
> @@ -10346,7 +10357,7 @@ static void perf_event_free_filter(struct perf_event *event)
>  }
>
>  #ifdef CONFIG_BPF_SYSCALL
> -static void bpf_overflow_handler(struct perf_event *event,
> +static int bpf_overflow_handler(struct perf_event *event,
>                                  struct perf_sample_data *data,
>                                  struct pt_regs *regs)
>  {
> @@ -10355,7 +10366,7 @@ static void bpf_overflow_handler(struct perf_event *event,
>                 .event = event,
>         };
>         struct bpf_prog *prog;
> -       int ret = 0;
> +       int ret = 1;
>
>         ctx.regs = perf_arch_bpf_user_pt_regs(regs);
>         if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
> @@ -10369,10 +10380,7 @@ static void bpf_overflow_handler(struct perf_event *event,
>         rcu_read_unlock();
>  out:
>         __this_cpu_dec(bpf_prog_active);
> -       if (!ret)
> -               return;
> -
> -       event->orig_overflow_handler(event, data, regs);
> +       return ret;
>  }
>
>  static int perf_event_set_bpf_handler(struct perf_event *event,
> @@ -10408,8 +10416,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,
>
>         event->prog = prog;
>         event->bpf_cookie = bpf_cookie;
> -       event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
> -       WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
>         return 0;
>  }
>
> @@ -10420,7 +10426,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
>         if (!prog)
>                 return;
>
> -       WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
>         event->prog = NULL;
>         bpf_prog_put(prog);
>  }
> @@ -11880,13 +11885,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
>                 overflow_handler = parent_event->overflow_handler;
>                 context = parent_event->overflow_handler_context;
>  #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
> -               if (overflow_handler == bpf_overflow_handler) {
> +               if (parent_event->prog) {
>                         struct bpf_prog *prog = parent_event->prog;
>
>                         bpf_prog_inc(prog);
>                         event->prog = prog;
> -                       event->orig_overflow_handler =
> -                               parent_event->orig_overflow_handler;
>                 }
>  #endif
>         }