netdev - Re: [PATCH bpf-next v1 03/19] bpf: add bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cc802671-76e6-e911-0e4e-53a4e99c69ff@fb.com>
Date:   Wed, 29 Apr 2020 13:15:02 -0700
From:   Yonghong Song <yhs@...com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Alexei Starovoitov <ast@...com>
CC:     Martin KaFai Lau <kafai@...com>, Andrii Nakryiko <andriin@...com>,
        bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next v1 03/19] bpf: add bpf_map iterator



On 4/29/20 12:19 PM, Andrii Nakryiko wrote:
> On Wed, Apr 29, 2020 at 8:34 AM Alexei Starovoitov <ast@...com> wrote:
>>
>> On 4/28/20 11:44 PM, Yonghong Song wrote:
>>>
>>>
>>> On 4/28/20 11:40 PM, Andrii Nakryiko wrote:
>>>> On Tue, Apr 28, 2020 at 11:30 PM Alexei Starovoitov <ast@...com> wrote:
>>>>>
>>>>> On 4/28/20 11:20 PM, Yonghong Song wrote:
>>>>>>
>>>>>>
>>>>>> On 4/28/20 11:08 PM, Andrii Nakryiko wrote:
>>>>>>> On Tue, Apr 28, 2020 at 10:10 PM Yonghong Song <yhs@...com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/28/20 7:44 PM, Alexei Starovoitov wrote:
>>>>>>>>> On 4/28/20 6:15 PM, Yonghong Song wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 4/28/20 5:48 PM, Alexei Starovoitov wrote:
>>>>>>>>>>> On 4/28/20 5:37 PM, Martin KaFai Lau wrote:
>>>>>>>>>>>>> +    prog = bpf_iter_get_prog(seq, sizeof(struct
>>>>>>>>>>>>> bpf_iter_seq_map_info),
>>>>>>>>>>>>> +                 &meta.session_id, &meta.seq_num,
>>>>>>>>>>>>> +                 v == (void *)0);
>>>>>>>>>>>>     From looking at seq_file.c, when will show() be called with
>>>>>>>>>>>> "v ==
>>>>>>>>>>>> NULL"?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> that v == NULL here and the whole verifier change just to allow
>>>>>>>>>>> NULL...
>>>>>>>>>>> may be use seq_num as an indicator of the last elem instead?
>>>>>>>>>>> Like seq_num with upper bit set to indicate that it's last?
>>>>>>>>>>
>>>>>>>>>> We could. But then verifier won't have an easy way to verify that.
>>>>>>>>>> For example, the above is expected:
>>>>>>>>>>
>>>>>>>>>>          int prog(struct bpf_map *map, u64 seq_num) {
>>>>>>>>>>             if (seq_num >> 63)
>>>>>>>>>>               return 0;
>>>>>>>>>>             ... map->id ...
>>>>>>>>>>             ... map->user_cnt ...
>>>>>>>>>>          }
>>>>>>>>>>
>>>>>>>>>> But if user writes
>>>>>>>>>>
>>>>>>>>>>          int prog(struct bpf_map *map, u64 seq_num) {
>>>>>>>>>>              ... map->id ...
>>>>>>>>>>              ... map->user_cnt ...
>>>>>>>>>>          }
>>>>>>>>>>
>>>>>>>>>> verifier won't be easy to conclude inproper map pointer tracing
>>>>>>>>>> here and in the above map->id, map->user_cnt will cause
>>>>>>>>>> exceptions and they will silently get value 0.
>>>>>>>>>
>>>>>>>>> I mean always pass valid object pointer into the prog.
>>>>>>>>> In above case 'map' will always be valid.
>>>>>>>>> Consider prog that iterating all map elements.
>>>>>>>>> It's weird that the prog would always need to do
>>>>>>>>> if (map == 0)
>>>>>>>>>       goto out;
>>>>>>>>> even if it doesn't care about finding last.
>>>>>>>>> All progs would have to have such extra 'if'.
>>>>>>>>> If we always pass valid object than there is no need
>>>>>>>>> for such extra checks inside the prog.
>>>>>>>>> First and last element can be indicated via seq_num
>>>>>>>>> or via another flag or via helper call like is_this_last_elem()
>>>>>>>>> or something.
>>>>>>>>
>>>>>>>> Okay, I see what you mean now. Basically this means
>>>>>>>> seq_ops->next() should try to get/maintain next two elements,
>>>>>>>
>>>>>>> What about the case when there are no elements to iterate to begin
>>>>>>> with? In that case, we still need to call bpf_prog for (empty)
>>>>>>> post-aggregation, but we have no valid element... For bpf_map
>>>>>>> iteration we could have fake empty bpf_map that would be passed, but
>>>>>>> I'm not sure it's applicable for any time of object (e.g., having a
>>>>>>> fake task_struct is probably quite a bit more problematic?)...
>>>>>>
>>>>>> Oh, yes, thanks for reminding me of this. I put a call to
>>>>>> bpf_prog in seq_ops->stop() especially to handle no object
>>>>>> case. In that case, seq_ops->start() will return NULL,
>>>>>> seq_ops->next() won't be called, and then seq_ops->stop()
>>>>>> is called. My earlier attempt tries to hook with next()
>>>>>> and then find it not working in all cases.
>>>>>
>>>>> wait a sec. seq_ops->stop() is not the end.
>>>>> With lseek of seq_file it can be called multiple times.
>>>
>>> Yes, I have taken care of this. when the object is NULL,
>>> bpf program will be called. When the object is NULL again,
>>> it won't be called. The private data remembers it has
>>> been called with NULL.
>>
>> Even without lseek stop() will be called multiple times.
>> If I read seq_file.c correctly it will be called before
>> every copy_to_user(). Which means that for a lot of text
>> (or if read() is done with small buffer) there will be
>> plenty of start,show,show,stop sequences.
> 
> 
> Right start/stop can be called multiple times, but seems like there
> are clear indicators of beginning of iteration and end of iteration:
> - start() with seq_num == 0 is start of iteration (can be called
> multiple times, if first element overflows buffer);
> - stop() with p == NULL is end of iteration (seems like can be called
> multiple times as well, if user keeps read()'ing after iteration
> completed).
> 
> There is another problem with stop(), though. If BPF program will
> attempt to output anything during stop(), that output will be just
> discarded. Not great. Especially if that output overflows and we need

The stop() output will not be discarded in the following cases:
    - regular show() objects overflow and stop() BPF program not called
    - regular show() objects not overflow, which means iteration is done,
      and stop() BPF program does not overflow.

The stop() seq_file output will be discarded if
    - regular show() objects not overflow and stop() BPF program output
      overflows.
    - no objects to iterate, BPF program got called, but its seq_file
      write/printf will be discarded.

Two options here:
   - implement Alexei suggestion to look ahead two elements to
     always having valid object and indicating the last element
     with a special flag.
   - Per Andrii's suggestion below to implement new way or to
     tweak seq_file() a little bit to resolve the above cases
     where stop() seq_file outputs being discarded.

Will try to experiment with both above options...


> to re-allocate buffer.
> 
> We are trying to use seq_file just to reuse 140 lines of code in
> seq_read(), which is no magic, just a simple double buffer and retry
> piece of logic. We don't need lseek and traverse, we don't need all
> the escaping stuff. I think bpf_iter implementation would be much
> simpler if bpf_iter had better control over iteration. Then this whole
> "end of iteration" behavior would be crystal clear. Should we maybe
> reconsider again?
> 
> I understand we want to re-use networking iteration code, but we can
> still do that with custom implementation of seq_read, because we are
> still using struct seq_file and follow its semantics. The change would
> be to allow stop(NULL) (or any stop() call for that matter) to perform
> output (and handle retry and buffer re-allocation). Or, alternatively,
> coupled with seq_operations intercept proposal in patch #7 discussion,
> we can add extra method (e.g., finish()) that would be called after
> all elements are traversed and will allow to emit extra stuff. We can
> do that (implement finish()) in seq_read, as well, if that's going to
> fly ok with seq_file maintainers, of course.
>