[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <250f99fd-7289-a8e2-a710-560305e2d17d@gmail.com>
Date: Wed, 21 Aug 2019 17:49:33 +0900
From: Toshiaki Makita <toshiaki.makita1@...il.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc: Stanislav Fomichev <sdf@...ichev.me>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
"David S. Miller" <davem@...emloft.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org,
bpf@...r.kernel.org, William Tu <u9012063@...il.com>
Subject: Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
On 19/08/20 (火) 3:15:46, Jakub Kicinski wrote:
I'm on vacation and replying slowly. Sorry for any inconvenience.
> On Sat, 17 Aug 2019 23:01:59 +0900, Toshiaki Makita wrote:
>> On 19/08/17 (土) 3:52:24, Jakub Kicinski wrote:
>>> On Fri, 16 Aug 2019 10:28:10 +0900, Toshiaki Makita wrote:
>>>> On 2019/08/16 4:22, Jakub Kicinski wrote:
>>>>> There's a certain allure in bringing the in-kernel BPF translation
>>>>> infrastructure forward. OTOH from system architecture perspective IMHO
>>>>> it does seem like a task best handed in user space. bpfilter can replace
>>>>> iptables completely, here we're looking at an acceleration relatively
>>>>> loosely coupled with flower.
>>>>
>>>> I don't think it's loosely coupled. Emulating TC behavior in userspace
>>>> is not so easy.
>>>>
>>>> Think about recent multi-mask support in flower. Previously userspace could
>>>> assume there is one mask and hash table for each preference in TC. After the
>>>> change TC accepts different masks with the same pref. Such a change tends to
>>>> break userspace emulation. It may ignore masks passed from flow insertion
>>>> and use the mask remembered when the first flow of the pref is inserted. It
>>>> may override the mask of all existing flows with the pref. It may fail to
>>>> insert such flows. Any of them would result in unexpected wrong datapath
>>>> handling which is critical.
>>>> I think such an emulation layer needs to be updated in sync with TC.
>>>
>>> Oh, so you're saying that if xdp_flow is merged all patches to
>>> cls_flower and netfilter which affect flow offload will be required
>>> to update xdp_flow as well?
>>
>> Hmm... you are saying that we are allowed to break other in-kernel
>> subsystem by some change? Sounds strange...
>
> No I'm not saying that, please don't put words in my mouth.
If we ignore xdp_flow when modifying something which affects flow
offload, that may cause breakage. I showed such an example using
multi-mask support. So I just wondered what you mean and guessed you
think we can break other subsystem in some situation.
I admit I should not have used the wording "you are saying...?". If it
was not unpleasant to you I'm sorry about that. But I think you should
not use it as well. I did not say "cls_flower and netfilter which affect
flow offload will be required to update xdp_flow". I guess most patches
which affect flow offload core will not break xdp_flow. In some cases
breakage may happen. In that case we need to fix xdp_flow as well.
> I'm asking you if that's your intention.
>
> Having an implementation nor support a feature of another implementation
> and degrade gracefully to the slower one is not necessarily breakage.
> We need to make a concious decision here, hence the clarifying question.
As I described above, breakage can happen in some case, and if the patch
breaks xdp_flow I think we need to fix xdp_flow at the same time. If
xdp_flow does not support newly added features but it works for existing
ones, it is OK. In the first place not all features can be offloaded to
xdp_flow. I think this is the same as HW-offload.
>>> That's a question of policy. Technically the implementation in user
>>> space is equivalent.
>>>
>>> The advantage of user space implementation is that you can add more
>>> to it and explore use cases which do not fit in the flow offload API,
>>> but are trivial for BPF. Not to mention the obvious advantage of
>>> decoupling the upgrade path.
>>
>> I understand the advantage, but I can't trust such a third-party kernel
>> emulation solution for this kind of thing which handles critical data path.
>
> That's a strange argument to make. All production data path BPF today
> comes from user space.
Probably my explanation was not sufficient. What I'm concerned about is
that this needs to emulate kernel behavior, and it is difficult.
I don't think userspace-defined datapath itself is not reliable, nor
eBPF ecosystem.
Toshiaki Makita
Powered by blists - more mailing lists