[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bza0n3_uukQu_gUxn3X46kVOHeu3rJPdHkh8=QYxunFiig@mail.gmail.com>
Date: Fri, 8 May 2020 11:15:50 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Yonghong Song <yhs@...com>
Cc: Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>,
Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 13/20] bpf: add bpf_seq_printf and
bpf_seq_write helpers
On Wed, May 6, 2020 at 2:42 PM Yonghong Song <yhs@...com> wrote:
>
>
>
> On 5/6/20 10:37 AM, Andrii Nakryiko wrote:
> > On Sun, May 3, 2020 at 11:26 PM Yonghong Song <yhs@...com> wrote:
> >>
> >> Two helpers bpf_seq_printf and bpf_seq_write, are added for
> >> writing data to the seq_file buffer.
> >>
> >> bpf_seq_printf supports common format string flag/width/type
> >> fields so at least I can get identical results for
> >> netlink and ipv6_route targets.
> >>
> >
> > Does seq_printf() has its own format string specification? Is there
> > any documentation explaining? I was confused by few different checks
> > below...
>
> Not really. Similar to bpf_trace_printk(), since we need to
> parse format string, so we may only support a subset of
> what seq_printf() does. But we should not invent new
> formats.
>
> >
> >> For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW
> >> specifically indicates a write failure due to overflow, which
> >> means the object will be repeated in the next bpf invocation
> >> if object collection stays the same. Note that if the object
> >> collection is changed, depending how collection traversal is
> >> done, even if the object still in the collection, it may not
> >> be visited.
> >>
> >> bpf_seq_printf may return -EBUSY meaning that internal percpu
> >> buffer for memory copy of strings or other pointees is
> >> not available. Bpf program can return 1 to indicate it
> >> wants the same object to be repeated. Right now, this should not
> >> happen on no-RT kernels since migrate_enable(), which guards
> >> bpf prog call, calls preempt_enable().
> >
> > You probably meant migrate_disable()/preempt_disable(), right? But
>
> Yes, sorry for typo.
>
> > could it still happen, at least due to NMI? E.g., perf_event BPF
> > program gets triggered during bpf_iter program execution? I think for
> > perf_event_output function, we have 3 levels, for one of each possible
> > "contexts"? Should we do something like that here as well?
>
> Currently bpf_seq_printf() and bpf_seq_write() helpers can
> only be called by iter bpf programs. The iter bpf program can only
> be run on process context as it is triggered by a read() syscall.
> So one level should be enough for non-RT kernel.
>
> For RT kernel, migrate_disable does not prevent preemption,
> so it is possible task in the middle of bpf_seq_printf() might
> be preempted, so I implemented the logic to return -EBUSY.
> I think this case should be extremely rare so I only implemented
> one level nesting.
yeah, makes sense
>
> >
> >>
> >> Signed-off-by: Yonghong Song <yhs@...com>
> >> ---
> >> include/uapi/linux/bpf.h | 32 +++++-
> >> kernel/trace/bpf_trace.c | 195 +++++++++++++++++++++++++++++++++
> >> scripts/bpf_helpers_doc.py | 2 +
> >> tools/include/uapi/linux/bpf.h | 32 +++++-
> >> 4 files changed, 259 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index 97ceb0f2e539..e440a9d5cca2 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -3076,6 +3076,34 @@ union bpf_attr {
> >> * See: clock_gettime(CLOCK_BOOTTIME)
> >> * Return
> >> * Current *ktime*.
> >> + *
> >
> > [...]
> >
> >> +BPF_CALL_5(bpf_seq_printf, struct seq_file *, m, char *, fmt, u32, fmt_size,
> >> + const void *, data, u32, data_len)
> >> +{
> >> + int err = -EINVAL, fmt_cnt = 0, memcpy_cnt = 0;
> >> + int i, buf_used, copy_size, num_args;
> >> + u64 params[MAX_SEQ_PRINTF_VARARGS];
> >> + struct bpf_seq_printf_buf *bufs;
> >> + const u64 *args = data;
> >> +
> >> + buf_used = this_cpu_inc_return(bpf_seq_printf_buf_used);
> >> + if (WARN_ON_ONCE(buf_used > 1)) {
> >> + err = -EBUSY;
> >> + goto out;
> >> + }
> >> +
> >> + bufs = this_cpu_ptr(&bpf_seq_printf_buf);
> >> +
> >> + /*
> >> + * bpf_check()->check_func_arg()->check_stack_boundary()
> >> + * guarantees that fmt points to bpf program stack,
> >> + * fmt_size bytes of it were initialized and fmt_size > 0
> >> + */
> >> + if (fmt[--fmt_size] != 0)
> >
> > If we allow fmt_size == 0, this will need to be changed.
>
> Currently, we do not support fmt_size == 0. Yes, if we allow, this
> needs change.
>
> >
> >> + goto out;
> >> +
> >> + if (data_len & 7)
> >> + goto out;
> >> +
> >> + for (i = 0; i < fmt_size; i++) {
> >> + if (fmt[i] == '%' && (!data || !data_len))
> >
> > So %% escaping is not supported?
>
> Yes, have not seen a need yet my ipv6_route/netlink example.
> Can certain add if there is a use case.
I can imagine this being used quite often when trying to print out
percentages... Would just suck to have to upgrade kernel just to be
able to print % character :)
>
> >
> >> + goto out;
> >> + }
> >> +
> >> + num_args = data_len / 8;
> >> +
> >> + /* check format string for allowed specifiers */
> >> + for (i = 0; i < fmt_size; i++) {
> >> + if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i]))
> >
> > why these restrictions? are they essential?
>
> This is the same restriction in bpf_trace_printk(). I guess the purpose
> is to avoid weird print. To promote bpf_iter to dump beyond asscii, I
> guess we can remove this restriction.
well, if underlying seq_printf() will fail for those "more liberal"
strings, then that would be bad. Basically, we should try to not
impose additional restrictions compared to seq_printf, but also no
need to allow more, which will be rejected by it. I haven't checked
seq_printf implementation though, so don't know what are those
restrictions.
>
> >
> >> + goto out;
> >> +
> >> + if (fmt[i] != '%')
> >> + continue;
> >> +
> >> + if (fmt_cnt >= MAX_SEQ_PRINTF_VARARGS) {
> >> + err = -E2BIG;
> >> + goto out;
> >> + }
> >> +
> >> + if (fmt_cnt >= num_args)
> >> + goto out;
> >> +
> >> + /* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
> >> + i++;
> >> +
> >> + /* skip optional "[0+-][num]" width formating field */
> >> + while (fmt[i] == '0' || fmt[i] == '+' || fmt[i] == '-')
> >
> > There could be space as well, as an alternative to 0.
>
> We can allow space. But '0' is used more common, right?
I use both space and 0 quite often, space especially with aligned strings.
>
> >
> >> + i++;
> >> + if (fmt[i] >= '1' && fmt[i] <= '9') {
> >> + i++;
> >> + while (fmt[i] >= '0' && fmt[i] <= '9')
> >> + i++;
> >> + }
> >> +
> >> + if (fmt[i] == 's') {
> >> + /* disallow any further format extensions */
> >> + if (fmt[i + 1] != 0 &&
> >> + !isspace(fmt[i + 1]) &&
> >> + !ispunct(fmt[i + 1]))
> >> + goto out;
> >
> > I'm not sure I follow this check either. printf("%sbla", "whatever")
> > is a perfectly fine format string. Unless seq_printf has some
> > additional restrictions?
>
> Yes, just some restriction inherited from bpf_trace_printk().
> Will remove.
see comment above, if we allow it here, but seq_printf() will reject
it, then there is no point
>
> >
> >> +
> >> + /* try our best to copy */
> >> + if (memcpy_cnt >= MAX_SEQ_PRINTF_MAX_MEMCPY) {
> >> + err = -E2BIG;
> >> + goto out;
> >> + }
> >> +
> >
> > [...]
> >
> >> +
> >> +static int bpf_seq_printf_btf_ids[5];
> >> +static const struct bpf_func_proto bpf_seq_printf_proto = {
> >> + .func = bpf_seq_printf,
> >> + .gpl_only = true,
> >> + .ret_type = RET_INTEGER,
> >> + .arg1_type = ARG_PTR_TO_BTF_ID,
> >> + .arg2_type = ARG_PTR_TO_MEM,
> >> + .arg3_type = ARG_CONST_SIZE,
> >
> > It feels like allowing zero shouldn't hurt too much?
>
> This is the format string, I would prefer to keep it non-zero.
yeah, makes sense, I suppose
>
> >
> >> + .arg4_type = ARG_PTR_TO_MEM_OR_NULL,
> >> + .arg5_type = ARG_CONST_SIZE_OR_ZERO,
> >> + .btf_id = bpf_seq_printf_btf_ids,
> >> +};
> >> +
> >> +BPF_CALL_3(bpf_seq_write, struct seq_file *, m, const void *, data, u32, len)
> >> +{
> >> + return seq_write(m, data, len) ? -EOVERFLOW : 0;
> >> +}
> >> +
> >> +static int bpf_seq_write_btf_ids[5];
> >> +static const struct bpf_func_proto bpf_seq_write_proto = {
> >> + .func = bpf_seq_write,
> >> + .gpl_only = true,
> >> + .ret_type = RET_INTEGER,
> >> + .arg1_type = ARG_PTR_TO_BTF_ID,
> >> + .arg2_type = ARG_PTR_TO_MEM,
> >> + .arg3_type = ARG_CONST_SIZE,
> >
> > Same, ARG_CONST_SIZE_OR_ZERO?
>
> This one, possible. Let me check.
I just remember how much trouble perf_event_output() was causing me
because it enforced >0 for data length. Even though my variable-sized
output was always >0, proving that to (especially older) verifier was
extremely hard. So the less unnecessary restrictions - the better.
>
> >
> >> + .btf_id = bpf_seq_write_btf_ids,
> >> +};
> >> +
> >
> > [...]
> >
Powered by blists - more mailing lists