[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZhYXpNu0c/rcjf0r@gmail.com>
Date: Wed, 10 Apr 2024 06:37:56 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Kyle Huey <me@...ehuey.com>
Cc: Kyle Huey <khuey@...ehuey.com>, linux-kernel@...r.kernel.org,
Andrii Nakryiko <andrii.nakryiko@...il.com>,
Jiri Olsa <jolsa@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Marco Elver <elver@...gle.com>,
Yonghong Song <yonghong.song@...ux.dev>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Robert O'Callahan <robert@...llahan.org>, bpf@...r.kernel.org
Subject: Re: [RESEND PATCH v5 0/4] Combine perf and bpf for fast eval of hw
breakpoint conditions]
* Kyle Huey <me@...ehuey.com> wrote:
> Peter, Ingo, could you take a look at this?
>
> ----
>
> rr, a userspace record and replay debugger[0], replays asynchronous
> events such as signals and context switches by essentially[1] setting a
> breakpoint at the address where the asynchronous event was delivered
> during recording with a condition that the program state matches the
> state when the event was delivered.
>
> Currently, rr uses software breakpoints that trap (via ptrace) to the
> supervisor, and evaluates the condition from the supervisor. If the
> asynchronous event is delivered in a tight loop (thus requiring the
> breakpoint condition to be repeatedly evaluated) the overhead can be
> immense. A patch to rr that uses hardware breakpoints via perf events
> with an attached BPF program to reject breakpoint hits where the
> condition is not satisfied reduces rr's replay overhead by 94% on a
> pathological (but a real customer-provided, not contrived) rr trace.
>
> The only obstacle to this approach is that while the kernel allows a BPF
> program to suppress sample output when a perf event overflows it does not
> suppress signalling the perf event fd or sending the perf event's
> SIGTRAP. This patch set redesigns __perf_overflow_handler() and
> bpf_overflow_handler() so that the former invokes the latter directly
> when appropriate rather than through the generic overflow handler
> machinery, passes the return code of the BPF program back to
> __perf_overflow_handler() to allow it to decide whether to execute the
> regular overflow handler, reorders bpf_overflow_handler() and the side
> effects of perf event overflow, changes __perf_overflow_handler() to
> suppress those side effects if the BPF program returns zero, and adds a
> selftest.
I suppose this optimization makes sense.
Patch quality still needs to be improved though - see my review comments.
Thanks,
Ingo
Powered by blists - more mailing lists