[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 5 May 2020 13:25:16 -0700
From: Yonghong Song <yhs@...com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
CC: Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>,
Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 05/20] bpf: implement bpf_seq_read() for bpf
iterator
On 5/5/20 12:56 PM, Andrii Nakryiko wrote:
> On Sun, May 3, 2020 at 11:26 PM Yonghong Song <yhs@...com> wrote:
>>
>> bpf iterator uses seq_file to provide a lossless
>> way to transfer data to user space. But we want to call
>> bpf program after all objects have been traversed, and
>> bpf program may write additional data to the
>> seq_file buffer. The current seq_read() does not work
>> for this use case.
>>
>> Besides allowing stop() function to write to the buffer,
>> the bpf_seq_read() also fixed the buffer size to one page.
>> If any single call of show() or stop() will emit data
>> more than one page to cause overflow, -E2BIG error code
>> will be returned to user space.
>>
>> Signed-off-by: Yonghong Song <yhs@...com>
>> ---
>> kernel/bpf/bpf_iter.c | 128 ++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 128 insertions(+)
>>
>> diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
>> index 05ae04ac1eca..2674c9cbc3dc 100644
>> --- a/kernel/bpf/bpf_iter.c
>> +++ b/kernel/bpf/bpf_iter.c
>> @@ -26,6 +26,134 @@ static DEFINE_MUTEX(targets_mutex);
>> /* protect bpf_iter_link changes */
>> static DEFINE_MUTEX(link_mutex);
>>
>> +/* bpf_seq_read, a customized and simpler version for bpf iterator.
>> + * no_llseek is assumed for this file.
>> + * The following are differences from seq_read():
>> + * . fixed buffer size (PAGE_SIZE)
>> + * . assuming no_llseek
>> + * . stop() may call bpf program, handling potential overflow there
>> + */
>> +static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>> + loff_t *ppos)
>> +{
>> + struct seq_file *seq = file->private_data;
>> + size_t n, offs, copied = 0;
>> + int err = 0;
>> + void *p;
>> +
>> + mutex_lock(&seq->lock);
>> +
>> + if (!seq->buf) {
>> + seq->size = PAGE_SIZE;
>> + seq->buf = kmalloc(seq->size, GFP_KERNEL);
>> + if (!seq->buf)
>> + goto Enomem;
>
> Why not just mutex_unlock and exit with -ENOMEM? Less goto'ing, more
> straightforward.
>
>> + }
>> +
>> + if (seq->count) {
>> + n = min(seq->count, size);
>> + err = copy_to_user(buf, seq->buf + seq->from, n);
>> + if (err)
>> + goto Efault;
>> + seq->count -= n;
>> + seq->from += n;
>> + copied = n;
>> + goto Done;
>> + }
>> +
>> + seq->from = 0;
>> + p = seq->op->start(seq, &seq->index);
>> + if (!p || IS_ERR(p))
>
> IS_ERR_OR_NULL?
Ack.
>
>> + goto Stop;
>> +
>> + err = seq->op->show(seq, p);
>> + if (seq_has_overflowed(seq)) {
>> + err = -E2BIG;
>> + goto Error_show;
>> + } else if (err) {
>> + /* < 0: go out, > 0: skip */
>> + if (likely(err < 0))
>> + goto Error_show;
>> + seq->count = 0;
>> + }
>
> This seems a bit more straightforward:
>
> if (seq_has_overflowed(seq))
> err = -E2BIG;
> if (err < 0)
> goto Error_show;
> else if (err > 0)
> seq->count = 0;
>
> Also, I wonder if err > 0 (so skip was requested), should we ignore
> overflow? So something like:
Think about overflow vs. err > 0 case, I double checked seq_file()
implementation again, yes, it is skipped. So your suggestion below
looks reasonable.
>
> if (err > 0) {
> seq->count = 0;
> } else {
> if (seq_has_overflowed(seq))
> err = -E2BIG;
> if (err)
> goto Error_show;
> }
>
>> +
>> + while (1) {
>> + loff_t pos = seq->index;
>> +
>> + offs = seq->count;
>> + p = seq->op->next(seq, p, &seq->index);
>> + if (pos == seq->index) {
>> + pr_info_ratelimited("buggy seq_file .next function %ps "
>> + "did not updated position index\n",
>> + seq->op->next);
>> + seq->index++;
>> + }
>> +
>> + if (!p || IS_ERR(p)) {
>
> Same, IS_ERR_OR_NULL.
Ack.
>
>> + err = PTR_ERR(p);
>> + break;
>> + }
>> + if (seq->count >= size)
>> + break;
>> +
>> + err = seq->op->show(seq, p);
>> + if (seq_has_overflowed(seq)) {
>> + if (offs == 0) {
>> + err = -E2BIG;
>> + goto Error_show;
>> + }
>> + seq->count = offs;
>> + break;
>> + } else if (err) {
>> + /* < 0: go out, > 0: skip */
>> + seq->count = offs;
>> + if (likely(err < 0)) {
>> + if (offs == 0)
>> + goto Error_show;
>> + break;
>> + }
>> + }
>
> Same question here about ignoring overflow if skip was requested.
Yes, we should prioritize err > 0 over overflow.
>
>> + }
>> +Stop:
>> + offs = seq->count;
>> + /* may call bpf program */
>> + seq->op->stop(seq, p);
>> + if (seq_has_overflowed(seq)) {
>> + if (offs == 0)
>> + goto Error_stop;
>> + seq->count = offs;
>
> just want to double-check, because it's not clear from the code. If
> all the start()/show()/next() succeeded, but stop() overflown. Would
> stop() be called again on subsequent read? Would start/show/next
> handle this correctly as well?
I am supposed to handle this unless there is a bug...
The idea is:
- if start()/show()/next() is fine and stop() overflow,
we will skip stop() output and move on.
(if we found out, we skip to the beginning of the
buffer, we will return -E2BIG. Otherwise, we will return
0 here, the user read() may just exit.)
- next time, when read() called again, the start() will return
NULL (since previous next() returns NULL) and the control
will jump to stop(), which will try to do another dump().
>
>> + }
>> +
>> + n = min(seq->count, size);
>> + err = copy_to_user(buf, seq->buf, n);
>> + if (err)
>> + goto Efault;
>> + copied = n;
>> + seq->count -= n;
>> + seq->from = n;
>> +Done:
>> + if (!copied)
>> + copied = err;
>> + else
>> + *ppos += copied;
>> + mutex_unlock(&seq->lock);
>> + return copied;
>> +
>> +Error_show:
>> + seq->op->stop(seq, p);
>> +Error_stop:
>> + seq->count = 0;
>> + goto Done;
>> +
>> +Enomem:
>> + err = -ENOMEM;
>> + goto Done;
>> +
>> +Efault:
>> + err = -EFAULT;
>> + goto Done;
>
> Enomem and Efault seem completely redundant and just add goto
> complexity to this algorithm. Let's just inline `err =
> -E(NOMEM|FAULT); goto Done;` instead?
We can do this. This is kind of original seq_read() coding
style. Agree that we do not need to follow them.
>
>> +}
>> +
>> int bpf_iter_reg_target(struct bpf_iter_reg *reg_info)
>> {
>> struct bpf_iter_target_info *tinfo;
>> --
>> 2.24.1
>>
Powered by blists - more mailing lists