linux-kernel - Re: [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic, apply to seq files/bpf_trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LRH.2.21.2008180945380.3461@localhost>
Date:   Tue, 18 Aug 2020 10:12:05 +0100 (IST)
From:   Alan Maguire <alan.maguire@...cle.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
cc:     Alan Maguire <alan.maguire@...cle.com>, ast@...nel.org,
        daniel@...earbox.net, andriin@...com, yhs@...com,
        linux@...musvillemoes.dk, andriy.shevchenko@...ux.intel.com,
        pmladek@...e.com, kafai@...com, songliubraving@...com,
        john.fastabend@...il.com, kpsingh@...omium.org, shuah@...nel.org,
        rdna@...com, scott.branden@...adcom.com, quentin@...valent.com,
        cneirabustos@...il.com, jakub@...udflare.com, mingo@...hat.com,
        rostedt@...dmis.org, bpf@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [RFC PATCH bpf-next 2/4] bpf: make BTF show support generic,
 apply to seq files/bpf_trace_printk


On Fri, 14 Aug 2020, Alexei Starovoitov wrote:

> On Fri, Aug 14, 2020 at 02:06:37PM +0100, Alan Maguire wrote:
> > On Wed, 12 Aug 2020, Alexei Starovoitov wrote:
> > 
> > > On Thu, Aug 06, 2020 at 03:42:23PM +0100, Alan Maguire wrote:
> > > > 
> > > > The bpf_trace_printk tracepoint is augmented with a "trace_id"
> > > > field; it is used to allow tracepoint filtering as typed display
> > > > information can easily be interspersed with other tracing data,
> > > > making it hard to read.  Specifying a trace_id will allow users
> > > > to selectively trace data, eliminating noise.
> > > 
> > > Since trace_id is not seen in trace_pipe, how do you expect users
> > > to filter by it?
> > 
> > Sorry should have specified this.  The approach is to use trace
> > instances and filtering such that we only see events associated
> > with a specific trace_id.  There's no need for the trace event to
> > actually display the trace_id - it's still usable as a filter.
> > The steps involved are:
> > 
> > 1. create a trace instance within which we can specify a fresh
> >    set of trace event enablings, filters etc.
> > 
> > mkdir /sys/kernel/debug/tracing/instances/traceid100
> > 
> > 2. enable the filter for the specific trace id
> > 
> > echo "trace_id == 100" > 
> > /sys/kernel/debug/tracing/instances/traceid100/events/bpf_trace/bpf_trace_printk/filter
> > 
> > 3. enable the trace event
> > 
> > echo 1 > 
> > /sys/kernel/debug/tracing/instances/events/bpf_trace/bpf_trace_printk/enable
> > 
> > 4. ensure the BPF program uses a trace_id 100 when calling bpf_trace_btf()
> 
> ouch.
> I think you interpreted the acceptance of the
> commit 7fb20f9e901e ("bpf, doc: Remove references to warning message when using bpf_trace_printk()")
> in the wrong way.
> 
> Everything that doc had said is still valid. In particular:
> -A: This is done to nudge program authors into better interfaces when
> -programs need to pass data to user space. Like bpf_perf_event_output()
> -can be used to efficiently stream data via perf ring buffer.
> -BPF maps can be used for asynchronous data sharing between kernel
> -and user space. bpf_trace_printk() should only be used for debugging.
> 
> bpf_trace_printk is for debugging only. _debugging of bpf programs themselves_.
> What you're describing above is logging and tracing. It's not debugging of programs.
> perf buffer, ring buffer, and seq_file interfaces are the right
> interfaces for tracing, logging, and kernel debugging.
> 
> > > It also feels like workaround. May be let bpf prog print the whole
> > > struct in one go with multiple new lines and call
> > > trace_bpf_trace_printk(buf) once?
> > 
> > We can do that absolutely, but I'd be interested to get your take
> > on the filtering mechanism before taking that approach.  I'll add
> > a description of the above mechanism to the cover letter and
> > patch to be clearer next time too.
> 
> I think patch 3 is no go, because it takes bpf_trace_printk in
> the wrong direction.
> Instead please refactor it to use string buffer or seq_file as an output.

Fair enough. I'm thinking a helper like

long bpf_btf_snprintf(char *str, u32 str_size, struct btf_ptr *ptr,
		      u32 ptr_size, u64 flags);

Then the user can choose perf event or ringbuf interfaces
to share the results with userspace.

> If the user happen to use bpf_trace_printk("%s", buf);
> after that to print that string buffer to trace_pipe that's user's choice.
> I can see such use case when program author wants to debug
> their bpf program. That's fine. But for kernel debugging, on demand and
> "always on" logging and tracing the documentation should point
> to sustainable interfaces that don't interfere with each other,
> can be run in parallel by multiple users, etc.
> 

The problem with bpf_trace_printk() under this approach is
that the string size for %s arguments is very limited;
bpf_trace_printk() restricts these to 64 bytes in size.
Looks like bpf_seq_printf() restricts a %s string to 128
bytes also.  We could add an additional helper for the 
bpf_seq case which calls bpf_seq_printf() for each component
in the object, i.e.

long bpf_seq_btf_printf(struct seq_file *m, struct btf_ptr *ptr,
			u32 ptr_size, u64 flags);

This would steer users away from bpf_trace_printk()
for this use case - since it can print only a small
amount of the string - while supporting all 
the other user-space communication mechanisms.

Alan