[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y7wFJ+NF0NwnmzLa@hirez.programming.kicks-ass.net>
Date: Mon, 9 Jan 2023 13:14:31 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Jiri Olsa <jolsa@...nel.org>,
Kan Liang <kan.liang@...ux.intel.com>,
Ravi Bangoria <ravi.bangoria@....com>, bpf@...r.kernel.org
Subject: Re: [PATCH 2/3] perf/core: Set data->sample_flags in
perf_prepare_sample()
On Thu, Dec 29, 2022 at 12:41:00PM -0800, Namhyung Kim wrote:
So I like the general idea; I just think it's turned into a bit of a
mess. That is code is already overly branchy which is known to hurt
performance, we should really try and not make it worse than absolutely
needed.
> kernel/events/core.c | 86 ++++++++++++++++++++++++++++++++------------
> 1 file changed, 63 insertions(+), 23 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index eacc3702654d..70bff8a04583 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7582,14 +7582,21 @@ void perf_prepare_sample(struct perf_event_header *header,
> filtered_sample_type = sample_type & ~data->sample_flags;
> __perf_event_header__init_id(header, data, event, filtered_sample_type);
>
> - if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
> - data->ip = perf_instruction_pointer(regs);
> + if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE)) {
> + /* attr.sample_type may not have PERF_SAMPLE_IP */
Right, but that shouldn't matter, IIRC its OK to have more bits set in
data->sample_flags than we have set in attr.sample_type. It just means
we have data available for sample types we're (possibly) not using.
That is, I think you can simply write this like:
> + if (!(data->sample_flags & PERF_SAMPLE_IP)) {
> + data->ip = perf_instruction_pointer(regs);
> + data->sample_flags |= PERF_SAMPLE_IP;
> + }
> + }
if (filtered_sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE)) {
data->ip = perf_instruction_pointer(regs);
data->sample_flags |= PERF_SAMPLE_IP);
}
...
if (filtered_sample_type & PERF_SAMPLE_CODE_PAGE_SIZE) {
data->code_page_size = perf_get_page_size(data->ip);
data->sample_flags |= PERF_SAMPLE_CODE_PAGE_SIZE;
}
Then after a single perf_prepare_sample() run we have:
pre | post
----------------------------------------
0 | 0
IP | IP
CODE_PAGE_SIZE | IP|CODE_PAGE_SIZE
IP|CODE_PAGE_SIZE | IP|CODE_PAGE_SIZE
So while data->sample_flags will have an extra bit set in the 3rd case,
that will not affect perf_sample_outout() which only looks at data->type
(== attr.sample_type).
And since data->sample_flags will have both bits set, a second run will
filter out both and avoid the extra work (except doing that will mess up
the branch predictors).
> if (sample_type & PERF_SAMPLE_CALLCHAIN) {
> int size = 1;
>
> - if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)
> + if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN) {
> data->callchain = perf_callchain(event, regs);
> + data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
> + }
>
> size += data->callchain->nr;
>
This, why can't this be:
if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN) {
data->callchain = perf_callchain(event, regs);
data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
header->size += (1 + data->callchain->nr) * sizeof(u64);
}
I suppose this is because perf_event_header lives on the stack of the
overflow handler and all that isn't available / relevant for the BPF
thing.
And we can't pull that out into anther function without adding yet
another branch fest.
However; inspired by your next patch; we can do something like so:
if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN) {
data->callchain = perf_callchain(event, regs);
data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
data->size += (1 + data->callchain->nr) * sizeof(u64);
}
And then have __perf_event_output() (or something thereabout) do:
perf_prepare_sample(data, event, regs);
perf_prepare_header(&header, data, event);
err = output_begin(&handle, data, event, header.size);
if (err)
goto exit;
perf_output_sample(&handle, &header, data, event);
perf_output_end(&handle);
With perf_prepare_header() being something like:
header->type = PERF_RECORD_SAMPLE;
header->size = sizeof(*header) + event->header_size + data->size;
header->misc = perf_misc_flags(regs);
...
Hmm ?
(same for all the other sites)
Powered by blists - more mailing lists