[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f5bd594-9511-1df4-093d-33111fc9c2dd@fb.com>
Date: Wed, 29 Apr 2020 20:06:53 -0700
From: Alexei Starovoitov <ast@...com>
To: Yonghong Song <yhs@...com>,
Andrii Nakryiko <andrii.nakryiko@...il.com>
CC: Martin KaFai Lau <kafai@...com>, Andrii Nakryiko <andriin@...com>,
bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next v1 03/19] bpf: add bpf_map iterator
On 4/29/20 1:15 PM, Yonghong Song wrote:
>>>
>>> Even without lseek stop() will be called multiple times.
>>> If I read seq_file.c correctly it will be called before
>>> every copy_to_user(). Which means that for a lot of text
>>> (or if read() is done with small buffer) there will be
>>> plenty of start,show,show,stop sequences.
>>
>>
>> Right start/stop can be called multiple times, but seems like there
>> are clear indicators of beginning of iteration and end of iteration:
>> - start() with seq_num == 0 is start of iteration (can be called
>> multiple times, if first element overflows buffer);
>> - stop() with p == NULL is end of iteration (seems like can be called
>> multiple times as well, if user keeps read()'ing after iteration
>> completed).
>>
>> There is another problem with stop(), though. If BPF program will
>> attempt to output anything during stop(), that output will be just
>> discarded. Not great. Especially if that output overflows and we need
>
> The stop() output will not be discarded in the following cases:
> - regular show() objects overflow and stop() BPF program not called
> - regular show() objects not overflow, which means iteration is done,
> and stop() BPF program does not overflow.
>
> The stop() seq_file output will be discarded if
> - regular show() objects not overflow and stop() BPF program output
> overflows.
> - no objects to iterate, BPF program got called, but its seq_file
> write/printf will be discarded.
>
> Two options here:
> - implement Alexei suggestion to look ahead two elements to
> always having valid object and indicating the last element
> with a special flag.
> - Per Andrii's suggestion below to implement new way or to
> tweak seq_file() a little bit to resolve the above cases
> where stop() seq_file outputs being discarded.
>
> Will try to experiment with both above options...
>
>
>> to re-allocate buffer.
>>
>> We are trying to use seq_file just to reuse 140 lines of code in
>> seq_read(), which is no magic, just a simple double buffer and retry
>> piece of logic. We don't need lseek and traverse, we don't need all
>> the escaping stuff. I think bpf_iter implementation would be much
>> simpler if bpf_iter had better control over iteration. Then this whole
>> "end of iteration" behavior would be crystal clear. Should we maybe
>> reconsider again?
That's what I was advocating for some time now.
I think seq_file is barely usable as a /proc extension and completely
unusable for iterating.
All the discussions in the last few weeks are pointing out that
majority of use cases are in the iterating space instead of dumping.
Dumping human readable strings as unstable /proc extension is
a small subset. So I think we shouldn't use fs/seq_file.c.
The dance around double op->next() or introducing op->finish()
into seq_ops looks like fifth wheel to the car.
I think bpf_iter semantics and bpf prog logic would be much simpler
and easier to understand if op->read method was re-implemented
for the purpose of iterating the objects.
I mean seq_op->start/next/stop can be reused as-is to iterate
existing kernel objects like sockets, but seq_read() will not be
used. We should explicitly disable lseek and write on our
cat-able files and use new bpf_seq_read() as .read op.
This specialized bpf_seq_read() will still do a sequences of
start/show/show/stop for every copy_to_user, but we don't need to
add finish() to seq_op and hack existing seq_read().
We also will be able to provide precise seq_num into the program
instead of approximation.
bpf_seq_read wouldn't need to deal with ppos and traverse.
It wouldn't need fancy m->size<<=1 retries.
It can allocate fixed PAGE_SIZE and be done with it.
It's fine to restrict bpf progs to not dump more than 4k
chracters per object.
And we can call bpf_iter prog exactly once per element.
Plenty of pros and no real cons.
Powered by blists - more mailing lists