linux-kernel - Re: [RFC PATCH 0/5] Make eBPF programs output data to perf event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 2 Jul 2015 11:38:23 +0800
From:	He Kuang <hekuang@...wei.com>
To:	Alexei Starovoitov <ast@...mgrid.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"Wangnan (F)" <wangnan0@...wei.com>
CC:	<rostedt@...dmis.org>, <masami.hiramatsu.pt@...achi.com>,
	<mingo@...hat.com>, <acme@...hat.com>, <jolsa@...nel.org>,
	<namhyung@...nel.org>, <linux-kernel@...r.kernel.org>,
	pi3orama <pi3orama@....com>
Subject: Re: [RFC PATCH 0/5] Make eBPF programs output data to perf event



On 2015/7/2 10:48, Alexei Starovoitov wrote:
> On 7/1/15 4:58 AM, Peter Zijlstra wrote:
>>
>> But why create a separate trace buffer, it should go into the regular
>> perf buffer.
>
> +1
>
> I think
> +static char __percpu *perf_extra_trace_buf[PERF_NR_CONTEXTS];
> is redundant.
> It adds quite a bit of unnecessary complexity to the whole patch set.
>
> Also the call to bpf_output_sample() is not effective unless program
> returns 1. It's a confusing user interface.
>
> Also you cannot ever do:
>       BPF_FUNC_probe_read,
> +    BPF_FUNC_output_sample,
>       BPF_FUNC_ktime_get_ns,
> new functions must be added to the end.
>
> Why not just do:
> perf_trace_buf_prepare() + perf_trace_buf_submit() from the helper?
> No changes to current code.
> No need to call __get_data_size() and other overhead.
> The helper can be called multiple times from the same program.
> imo much cleaner.
>

Invoke perf_trace_buf_submit() will generate a second perf
event (header->type = PERF_RECORD_SAMPLE) entry which is
different from the event entry outputed by the orignial
kprobe. So the final result of the example in 00/00 patch may
like this:

   sample entry 1(from bpf_prog):
     comm timestamp1 generic_perform_write pmu_value=0x1234
                                                                                             
   sample entry 2(from original kprobe):
     comm timestamp2 generic_perform_write: (ffffffff81140b60)
                                                                                             
Compared with current implementation:
                                                                                             
   combined sample entry:
     comm timestamp generic_perform_write: (ffffffff81140b60) pmu_value=0x1234

The former two entries may be discontinuous as there are multiple
threads and kprobes to be recorded, and there's a chance that one
entry is missed but the other is recorded. What we need is the
pmu_value read when 'generic_perform_write' enters, the two
entries result is not intuitive enough and userspace tools have
to do the work to find and combine those two sample entries to
get the result.

Thank you.

> Also how about calling this helper:
> bpf_trace_buf_submit(void *stack_ptr, int size) ?
> bpf_output_sample, I think, is odd name. It's not a sample.
> May be other name?
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/