[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <71cff8d8-05b9-87ef-8a12-1da3e38c4b55@fb.com>
Date: Wed, 6 May 2020 14:42:15 -0700
From: Yonghong Song <yhs@...com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
CC: Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>,
Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 13/20] bpf: add bpf_seq_printf and
bpf_seq_write helpers
On 5/6/20 10:37 AM, Andrii Nakryiko wrote:
> On Sun, May 3, 2020 at 11:26 PM Yonghong Song <yhs@...com> wrote:
>>
>> Two helpers bpf_seq_printf and bpf_seq_write, are added for
>> writing data to the seq_file buffer.
>>
>> bpf_seq_printf supports common format string flag/width/type
>> fields so at least I can get identical results for
>> netlink and ipv6_route targets.
>>
>
> Does seq_printf() has its own format string specification? Is there
> any documentation explaining? I was confused by few different checks
> below...
Not really. Similar to bpf_trace_printk(), since we need to
parse format string, so we may only support a subset of
what seq_printf() does. But we should not invent new
formats.
>
>> For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW
>> specifically indicates a write failure due to overflow, which
>> means the object will be repeated in the next bpf invocation
>> if object collection stays the same. Note that if the object
>> collection is changed, depending how collection traversal is
>> done, even if the object still in the collection, it may not
>> be visited.
>>
>> bpf_seq_printf may return -EBUSY meaning that internal percpu
>> buffer for memory copy of strings or other pointees is
>> not available. Bpf program can return 1 to indicate it
>> wants the same object to be repeated. Right now, this should not
>> happen on no-RT kernels since migrate_enable(), which guards
>> bpf prog call, calls preempt_enable().
>
> You probably meant migrate_disable()/preempt_disable(), right? But
Yes, sorry for typo.
> could it still happen, at least due to NMI? E.g., perf_event BPF
> program gets triggered during bpf_iter program execution? I think for
> perf_event_output function, we have 3 levels, for one of each possible
> "contexts"? Should we do something like that here as well?
Currently bpf_seq_printf() and bpf_seq_write() helpers can
only be called by iter bpf programs. The iter bpf program can only
be run on process context as it is triggered by a read() syscall.
So one level should be enough for non-RT kernel.
For RT kernel, migrate_disable does not prevent preemption,
so it is possible task in the middle of bpf_seq_printf() might
be preempted, so I implemented the logic to return -EBUSY.
I think this case should be extremely rare so I only implemented
one level nesting.
>
>>
>> Signed-off-by: Yonghong Song <yhs@...com>
>> ---
>> include/uapi/linux/bpf.h | 32 +++++-
>> kernel/trace/bpf_trace.c | 195 +++++++++++++++++++++++++++++++++
>> scripts/bpf_helpers_doc.py | 2 +
>> tools/include/uapi/linux/bpf.h | 32 +++++-
>> 4 files changed, 259 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 97ceb0f2e539..e440a9d5cca2 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -3076,6 +3076,34 @@ union bpf_attr {
>> * See: clock_gettime(CLOCK_BOOTTIME)
>> * Return
>> * Current *ktime*.
>> + *
>
> [...]
>
>> +BPF_CALL_5(bpf_seq_printf, struct seq_file *, m, char *, fmt, u32, fmt_size,
>> + const void *, data, u32, data_len)
>> +{
>> + int err = -EINVAL, fmt_cnt = 0, memcpy_cnt = 0;
>> + int i, buf_used, copy_size, num_args;
>> + u64 params[MAX_SEQ_PRINTF_VARARGS];
>> + struct bpf_seq_printf_buf *bufs;
>> + const u64 *args = data;
>> +
>> + buf_used = this_cpu_inc_return(bpf_seq_printf_buf_used);
>> + if (WARN_ON_ONCE(buf_used > 1)) {
>> + err = -EBUSY;
>> + goto out;
>> + }
>> +
>> + bufs = this_cpu_ptr(&bpf_seq_printf_buf);
>> +
>> + /*
>> + * bpf_check()->check_func_arg()->check_stack_boundary()
>> + * guarantees that fmt points to bpf program stack,
>> + * fmt_size bytes of it were initialized and fmt_size > 0
>> + */
>> + if (fmt[--fmt_size] != 0)
>
> If we allow fmt_size == 0, this will need to be changed.
Currently, we do not support fmt_size == 0. Yes, if we allow, this
needs change.
>
>> + goto out;
>> +
>> + if (data_len & 7)
>> + goto out;
>> +
>> + for (i = 0; i < fmt_size; i++) {
>> + if (fmt[i] == '%' && (!data || !data_len))
>
> So %% escaping is not supported?
Yes, have not seen a need yet my ipv6_route/netlink example.
Can certain add if there is a use case.
>
>> + goto out;
>> + }
>> +
>> + num_args = data_len / 8;
>> +
>> + /* check format string for allowed specifiers */
>> + for (i = 0; i < fmt_size; i++) {
>> + if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i]))
>
> why these restrictions? are they essential?
This is the same restriction in bpf_trace_printk(). I guess the purpose
is to avoid weird print. To promote bpf_iter to dump beyond asscii, I
guess we can remove this restriction.
>
>> + goto out;
>> +
>> + if (fmt[i] != '%')
>> + continue;
>> +
>> + if (fmt_cnt >= MAX_SEQ_PRINTF_VARARGS) {
>> + err = -E2BIG;
>> + goto out;
>> + }
>> +
>> + if (fmt_cnt >= num_args)
>> + goto out;
>> +
>> + /* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
>> + i++;
>> +
>> + /* skip optional "[0+-][num]" width formating field */
>> + while (fmt[i] == '0' || fmt[i] == '+' || fmt[i] == '-')
>
> There could be space as well, as an alternative to 0.
We can allow space. But '0' is used more common, right?
>
>> + i++;
>> + if (fmt[i] >= '1' && fmt[i] <= '9') {
>> + i++;
>> + while (fmt[i] >= '0' && fmt[i] <= '9')
>> + i++;
>> + }
>> +
>> + if (fmt[i] == 's') {
>> + /* disallow any further format extensions */
>> + if (fmt[i + 1] != 0 &&
>> + !isspace(fmt[i + 1]) &&
>> + !ispunct(fmt[i + 1]))
>> + goto out;
>
> I'm not sure I follow this check either. printf("%sbla", "whatever")
> is a perfectly fine format string. Unless seq_printf has some
> additional restrictions?
Yes, just some restriction inherited from bpf_trace_printk().
Will remove.
>
>> +
>> + /* try our best to copy */
>> + if (memcpy_cnt >= MAX_SEQ_PRINTF_MAX_MEMCPY) {
>> + err = -E2BIG;
>> + goto out;
>> + }
>> +
>
> [...]
>
>> +
>> +static int bpf_seq_printf_btf_ids[5];
>> +static const struct bpf_func_proto bpf_seq_printf_proto = {
>> + .func = bpf_seq_printf,
>> + .gpl_only = true,
>> + .ret_type = RET_INTEGER,
>> + .arg1_type = ARG_PTR_TO_BTF_ID,
>> + .arg2_type = ARG_PTR_TO_MEM,
>> + .arg3_type = ARG_CONST_SIZE,
>
> It feels like allowing zero shouldn't hurt too much?
This is the format string, I would prefer to keep it non-zero.
>
>> + .arg4_type = ARG_PTR_TO_MEM_OR_NULL,
>> + .arg5_type = ARG_CONST_SIZE_OR_ZERO,
>> + .btf_id = bpf_seq_printf_btf_ids,
>> +};
>> +
>> +BPF_CALL_3(bpf_seq_write, struct seq_file *, m, const void *, data, u32, len)
>> +{
>> + return seq_write(m, data, len) ? -EOVERFLOW : 0;
>> +}
>> +
>> +static int bpf_seq_write_btf_ids[5];
>> +static const struct bpf_func_proto bpf_seq_write_proto = {
>> + .func = bpf_seq_write,
>> + .gpl_only = true,
>> + .ret_type = RET_INTEGER,
>> + .arg1_type = ARG_PTR_TO_BTF_ID,
>> + .arg2_type = ARG_PTR_TO_MEM,
>> + .arg3_type = ARG_CONST_SIZE,
>
> Same, ARG_CONST_SIZE_OR_ZERO?
This one, possible. Let me check.
>
>> + .btf_id = bpf_seq_write_btf_ids,
>> +};
>> +
>
> [...]
>
Powered by blists - more mailing lists