[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ecdcd72-255d-26d1-baf3-dc64498753c2@fb.com>
Date: Wed, 21 Aug 2019 18:43:49 +0000
From: Yonghong Song <yhs@...com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Daniel Xu <dxu@...uu.xyz>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
Song Liu <songliubraving@...com>,
Andrii Nakryiko <andriin@...com>,
"mingo@...hat.com" <mingo@...hat.com>,
"acme@...nel.org" <acme@...nel.org>,
Alexei Starovoitov <ast@...com>,
"alexander.shishkin@...ux.intel.com"
<alexander.shishkin@...ux.intel.com>,
"jolsa@...hat.com" <jolsa@...hat.com>,
"namhyung@...nel.org" <namhyung@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Kernel Team <Kernel-team@...com>,
"Arnaldo Carvalho de Melo" <acme@...hat.com>
Subject: Re: [PATCH v3 bpf-next 1/4] tracing/probe: Add
PERF_EVENT_IOC_QUERY_PROBE ioctl
On 8/21/19 11:31 AM, Peter Zijlstra wrote:
> On Wed, Aug 21, 2019 at 04:54:47PM +0000, Yonghong Song wrote:
>> Currently, in kernel/trace/bpf_trace.c, we have
>>
>> unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
>> {
>> unsigned int ret;
>>
>> if (in_nmi()) /* not supported yet */
>> return 1;
>>
>> preempt_disable();
>>
>> if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
>
> Yes, I'm aware of that.
>
>> In the above, the events with bpf program attached will be missed
>> if the context is nmi interrupt, or if some recursion happens even with
>> the same or different bpf programs.
>> In case of recursion, the events will not be sent to ring buffer.
>
> And while that is significantly worse than what ftrace/perf have, it is
> fundamentally the same thing.
>
> perf allows (and iirc ftrace does too) 4 nested context per CPU
> (task,softirq,irq,nmi) but any recursion within those context and we
> drop stuff.
>
> The BPF stuff is just more eager to drop things on the floor, but it is
> fundamentally the same.
>
>> A lot of bpf-based tracing programs uses maps to communicate and
>> do not allocate ring buffer at all.
>
> So extending PERF_RECORD_LOST doesn't work. But PERF_FORMAT_LOST might
> still work fine; but you get to implement it for all software events.
Could you give more specifics about PERF_FORMAT_LOST? Googling
"PERF_FORMAT_LOST" only yields two emails which we are discussing here :-(
>
>> Maybe we can still use ioctl based approach which is light weighted
>> compared to ring buffer approach? If a fd has bpf attached, nhit/nmisses
>> means the kprobe is processed by bpf program or not.
>
> There is nothing kprobe specific here. Kprobes just appear to be the
> only one actually accounting the recursion cases, but everyone has
> them.
Sorry to be specific, kprobe is just an example, I actually refers to
any perf event where bpf can attach to, which theoretically are any
perf events which can be opened with "perf_event_open" syscall although
some of them (e.g., software events?) may not have bpf running hooks yet.
>
>> Currently, for debugfs, the nhit/nmisses info is exposed at
>> {k|u}probe_profile. Alternative, we could expose the nhit/nmisses
>> in /proc/self/fdinfo/<fd>. User can query this interface to
>> get numbers.
>
> No, we're not adding stuff to procfs for this.
No problem. Just a suggestion.
Powered by blists - more mailing lists