netdev - Re: [PATCH bpf-next v2 13/20] bpf: add bpf_seq_printf and bpf_seq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <71cff8d8-05b9-87ef-8a12-1da3e38c4b55@fb.com>
Date:   Wed, 6 May 2020 14:42:15 -0700
From:   Yonghong Song <yhs@...com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
CC:     Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Martin KaFai Lau <kafai@...com>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 13/20] bpf: add bpf_seq_printf and
 bpf_seq_write helpers



On 5/6/20 10:37 AM, Andrii Nakryiko wrote:
> On Sun, May 3, 2020 at 11:26 PM Yonghong Song <yhs@...com> wrote:
>>
>> Two helpers bpf_seq_printf and bpf_seq_write, are added for
>> writing data to the seq_file buffer.
>>
>> bpf_seq_printf supports common format string flag/width/type
>> fields so at least I can get identical results for
>> netlink and ipv6_route targets.
>>
> 
> Does seq_printf() has its own format string specification? Is there
> any documentation explaining? I was confused by few different checks
> below...

Not really. Similar to bpf_trace_printk(), since we need to
parse format string, so we may only support a subset of
what seq_printf() does. But we should not invent new
formats.

> 
>> For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW
>> specifically indicates a write failure due to overflow, which
>> means the object will be repeated in the next bpf invocation
>> if object collection stays the same. Note that if the object
>> collection is changed, depending how collection traversal is
>> done, even if the object still in the collection, it may not
>> be visited.
>>
>> bpf_seq_printf may return -EBUSY meaning that internal percpu
>> buffer for memory copy of strings or other pointees is
>> not available. Bpf program can return 1 to indicate it
>> wants the same object to be repeated. Right now, this should not
>> happen on no-RT kernels since migrate_enable(), which guards
>> bpf prog call, calls preempt_enable().
> 
> You probably meant migrate_disable()/preempt_disable(), right? But

Yes, sorry for typo.

> could it still happen, at least due to NMI? E.g., perf_event BPF
> program gets triggered during bpf_iter program execution? I think for
> perf_event_output function, we have 3 levels, for one of each possible
> "contexts"? Should we do something like that here as well?

Currently bpf_seq_printf() and bpf_seq_write() helpers can
only be called by iter bpf programs. The iter bpf program can only
be run on process context as it is triggered by a read() syscall.
So one level should be enough for non-RT kernel.

For RT kernel, migrate_disable does not prevent preemption,
so it is possible task in the middle of bpf_seq_printf() might
be preempted, so I implemented the logic to return -EBUSY.
I think this case should be extremely rare so I only implemented
one level nesting.

> 
>>
>> Signed-off-by: Yonghong Song <yhs@...com>
>> ---
>>   include/uapi/linux/bpf.h       |  32 +++++-
>>   kernel/trace/bpf_trace.c       | 195 +++++++++++++++++++++++++++++++++
>>   scripts/bpf_helpers_doc.py     |   2 +
>>   tools/include/uapi/linux/bpf.h |  32 +++++-
>>   4 files changed, 259 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 97ceb0f2e539..e440a9d5cca2 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -3076,6 +3076,34 @@ union bpf_attr {
>>    *             See: clock_gettime(CLOCK_BOOTTIME)
>>    *     Return
>>    *             Current *ktime*.
>> + *
> 
> [...]
> 
>> +BPF_CALL_5(bpf_seq_printf, struct seq_file *, m, char *, fmt, u32, fmt_size,
>> +          const void *, data, u32, data_len)
>> +{
>> +       int err = -EINVAL, fmt_cnt = 0, memcpy_cnt = 0;
>> +       int i, buf_used, copy_size, num_args;
>> +       u64 params[MAX_SEQ_PRINTF_VARARGS];
>> +       struct bpf_seq_printf_buf *bufs;
>> +       const u64 *args = data;
>> +
>> +       buf_used = this_cpu_inc_return(bpf_seq_printf_buf_used);
>> +       if (WARN_ON_ONCE(buf_used > 1)) {
>> +               err = -EBUSY;
>> +               goto out;
>> +       }
>> +
>> +       bufs = this_cpu_ptr(&bpf_seq_printf_buf);
>> +
>> +       /*
>> +        * bpf_check()->check_func_arg()->check_stack_boundary()
>> +        * guarantees that fmt points to bpf program stack,
>> +        * fmt_size bytes of it were initialized and fmt_size > 0
>> +        */
>> +       if (fmt[--fmt_size] != 0)
> 
> If we allow fmt_size == 0, this will need to be changed.

Currently, we do not support fmt_size == 0. Yes, if we allow, this
needs change.

> 
>> +               goto out;
>> +
>> +       if (data_len & 7)
>> +               goto out;
>> +
>> +       for (i = 0; i < fmt_size; i++) {
>> +               if (fmt[i] == '%' && (!data || !data_len))
> 
> So %% escaping is not supported?

Yes, have not seen a need yet my ipv6_route/netlink example.
Can certain add if there is a use case.

> 
>> +                       goto out;
>> +       }
>> +
>> +       num_args = data_len / 8;
>> +
>> +       /* check format string for allowed specifiers */
>> +       for (i = 0; i < fmt_size; i++) {
>> +               if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i]))
> 
> why these restrictions? are they essential?

This is the same restriction in bpf_trace_printk(). I guess the purpose
is to avoid weird print. To promote bpf_iter to dump beyond asscii, I 
guess we can remove this restriction.

> 
>> +                       goto out;
>> +
>> +               if (fmt[i] != '%')
>> +                       continue;
>> +
>> +               if (fmt_cnt >= MAX_SEQ_PRINTF_VARARGS) {
>> +                       err = -E2BIG;
>> +                       goto out;
>> +               }
>> +
>> +               if (fmt_cnt >= num_args)
>> +                       goto out;
>> +
>> +               /* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
>> +               i++;
>> +
>> +               /* skip optional "[0+-][num]" width formating field */
>> +               while (fmt[i] == '0' || fmt[i] == '+'  || fmt[i] == '-')
> 
> There could be space as well, as an alternative to 0.

We can allow space. But '0' is used more common, right?

> 
>> +                       i++;
>> +               if (fmt[i] >= '1' && fmt[i] <= '9') {
>> +                       i++;
>> +                       while (fmt[i] >= '0' && fmt[i] <= '9')
>> +                               i++;
>> +               }
>> +
>> +               if (fmt[i] == 's') {
>> +                       /* disallow any further format extensions */
>> +                       if (fmt[i + 1] != 0 &&
>> +                           !isspace(fmt[i + 1]) &&
>> +                           !ispunct(fmt[i + 1]))
>> +                               goto out;
> 
> I'm not sure I follow this check either. printf("%sbla", "whatever")
> is a perfectly fine format string. Unless seq_printf has some
> additional restrictions?

Yes, just some restriction inherited from bpf_trace_printk().
Will remove.

> 
>> +
>> +                       /* try our best to copy */
>> +                       if (memcpy_cnt >= MAX_SEQ_PRINTF_MAX_MEMCPY) {
>> +                               err = -E2BIG;
>> +                               goto out;
>> +                       }
>> +
> 
> [...]
> 
>> +
>> +static int bpf_seq_printf_btf_ids[5];
>> +static const struct bpf_func_proto bpf_seq_printf_proto = {
>> +       .func           = bpf_seq_printf,
>> +       .gpl_only       = true,
>> +       .ret_type       = RET_INTEGER,
>> +       .arg1_type      = ARG_PTR_TO_BTF_ID,
>> +       .arg2_type      = ARG_PTR_TO_MEM,
>> +       .arg3_type      = ARG_CONST_SIZE,
> 
> It feels like allowing zero shouldn't hurt too much?

This is the format string, I would prefer to keep it non-zero.

> 
>> +       .arg4_type      = ARG_PTR_TO_MEM_OR_NULL,
>> +       .arg5_type      = ARG_CONST_SIZE_OR_ZERO,
>> +       .btf_id         = bpf_seq_printf_btf_ids,
>> +};
>> +
>> +BPF_CALL_3(bpf_seq_write, struct seq_file *, m, const void *, data, u32, len)
>> +{
>> +       return seq_write(m, data, len) ? -EOVERFLOW : 0;
>> +}
>> +
>> +static int bpf_seq_write_btf_ids[5];
>> +static const struct bpf_func_proto bpf_seq_write_proto = {
>> +       .func           = bpf_seq_write,
>> +       .gpl_only       = true,
>> +       .ret_type       = RET_INTEGER,
>> +       .arg1_type      = ARG_PTR_TO_BTF_ID,
>> +       .arg2_type      = ARG_PTR_TO_MEM,
>> +       .arg3_type      = ARG_CONST_SIZE,
> 
> Same, ARG_CONST_SIZE_OR_ZERO?

This one, possible. Let me check.

> 
>> +       .btf_id         = bpf_seq_write_btf_ids,
>> +};
>> +
> 
> [...]
>