linux-kernel - Re: [PATCH 2/3] perf/core: Set data->sample_flags in perf_prepare

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y7wFJ+NF0NwnmzLa@hirez.programming.kicks-ass.net>
Date:   Mon, 9 Jan 2023 13:14:31 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...nel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Ravi Bangoria <ravi.bangoria@....com>, bpf@...r.kernel.org
Subject: Re: [PATCH 2/3] perf/core: Set data->sample_flags in
 perf_prepare_sample()

On Thu, Dec 29, 2022 at 12:41:00PM -0800, Namhyung Kim wrote:

So I like the general idea; I just think it's turned into a bit of a
mess. That is code is already overly branchy which is known to hurt
performance, we should really try and not make it worse than absolutely
needed.

>  kernel/events/core.c | 86 ++++++++++++++++++++++++++++++++------------
>  1 file changed, 63 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index eacc3702654d..70bff8a04583 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7582,14 +7582,21 @@ void perf_prepare_sample(struct perf_event_header *header,
>  	filtered_sample_type = sample_type & ~data->sample_flags;
>  	__perf_event_header__init_id(header, data, event, filtered_sample_type);
>  
> -	if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
> -		data->ip = perf_instruction_pointer(regs);
> +	if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE)) {
> +		/* attr.sample_type may not have PERF_SAMPLE_IP */

Right, but that shouldn't matter, IIRC its OK to have more bits set in
data->sample_flags than we have set in attr.sample_type. It just means
we have data available for sample types we're (possibly) not using.

That is, I think you can simply write this like:

> +		if (!(data->sample_flags & PERF_SAMPLE_IP)) {
> +			data->ip = perf_instruction_pointer(regs);
> +			data->sample_flags |= PERF_SAMPLE_IP;
> +		}
> +	}

	if (filtered_sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE)) {
		data->ip = perf_instruction_pointer(regs);
		data->sample_flags |= PERF_SAMPLE_IP);
	}

	...

	if (filtered_sample_type & PERF_SAMPLE_CODE_PAGE_SIZE) {
		data->code_page_size = perf_get_page_size(data->ip);
		data->sample_flags |= PERF_SAMPLE_CODE_PAGE_SIZE;
	}

Then after a single perf_prepare_sample() run we have:

  pre			|	post
  ----------------------------------------
  0			|	0
  IP			|	IP
  CODE_PAGE_SIZE	|	IP|CODE_PAGE_SIZE
  IP|CODE_PAGE_SIZE	|	IP|CODE_PAGE_SIZE

So while data->sample_flags will have an extra bit set in the 3rd case,
that will not affect perf_sample_outout() which only looks at data->type
(== attr.sample_type).

And since data->sample_flags will have both bits set, a second run will
filter out both and avoid the extra work (except doing that will mess up
the branch predictors).


>  	if (sample_type & PERF_SAMPLE_CALLCHAIN) {
>  		int size = 1;
>  
> -		if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)
> +		if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN) {
>  			data->callchain = perf_callchain(event, regs);
> +			data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
> +		}
>  
>  		size += data->callchain->nr;
>  

This, why can't this be:

	if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN) {
		data->callchain = perf_callchain(event, regs);
		data->sample_flags |= PERF_SAMPLE_CALLCHAIN;

		header->size += (1 + data->callchain->nr) * sizeof(u64);
	}

I suppose this is because perf_event_header lives on the stack of the
overflow handler and all that isn't available / relevant for the BPF
thing.

And we can't pull that out into anther function without adding yet
another branch fest.

However; inspired by your next patch; we can do something like so:

	if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN) {
		data->callchain = perf_callchain(event, regs);
		data->sample_flags |= PERF_SAMPLE_CALLCHAIN;

		data->size += (1 + data->callchain->nr) * sizeof(u64);
	}

And then have __perf_event_output() (or something thereabout) do:

	perf_prepare_sample(data, event, regs);
	perf_prepare_header(&header, data, event);
	err = output_begin(&handle, data, event, header.size);
	if (err)
		goto exit;
	perf_output_sample(&handle, &header, data, event);
	perf_output_end(&handle);

With perf_prepare_header() being something like:

	header->type = PERF_RECORD_SAMPLE;
	header->size = sizeof(*header) + event->header_size + data->size;
	header->misc = perf_misc_flags(regs);
	...

Hmm ?

(same for all the other sites)