[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87zhgk152l.fsf@toke.dk>
Date: Mon, 25 Nov 2019 14:03:14 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Toshiaki Makita <toshiaki.makita1@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <jakub.kicinski@...ronome.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Pablo Neira Ayuso <pablo@...filter.org>,
Jozsef Kadlecsik <kadlec@...filter.org>,
Florian Westphal <fw@...len.de>,
Pravin B Shelar <pshelar@....org>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org,
William Tu <u9012063@...il.com>,
Stanislav Fomichev <sdf@...ichev.me>
Subject: Re: [RFC PATCH v2 bpf-next 00/15] xdp_flow: Flow offload to XDP
Toshiaki Makita <toshiaki.makita1@...il.com> writes:
> On 2019/11/22 20:54, Toke Høiland-Jørgensen wrote:
>> Toshiaki Makita <toshiaki.makita1@...il.com> writes:
>>
>>> On 2019/11/18 19:20, Toke Høiland-Jørgensen wrote:
>>>> Toshiaki Makita <toshiaki.makita1@...il.com> writes:
>>>>
>>>> [... trimming the context a bit ...]
>>>>
>>>>>>>> Take your example of TC rules: You were proposing a flow like this:
>>>>>>>>
>>>>>>>> Userspace TC rule -> kernel rule table -> eBPF map -> generated XDP
>>>>>>>> program
>>>>>>>>
>>>>>>>> Whereas what I mean is that we could do this instead:
>>>>>>>>
>>>>>>>> Userspace TC rule -> kernel rule table
>>>>>>>>
>>>>>>>> and separately
>>>>>>>>
>>>>>>>> XDP program -> bpf helper -> lookup in kernel rule table
>>>>>>>
>>>>>>> Thanks, now I see what you mean.
>>>>>>> You expect an XDP program like this, right?
>>>>>>>
>>>>>>> int xdp_tc(struct xdp_md *ctx)
>>>>>>> {
>>>>>>> int act = bpf_xdp_tc_filter(ctx);
>>>>>>> return act;
>>>>>>> }
>>>>>>
>>>>>> Yes, basically, except that the XDP program would need to parse the
>>>>>> packet first, and bpf_xdp_tc_filter() would take a parameter struct with
>>>>>> the parsed values. See the usage of bpf_fib_lookup() in
>>>>>> bpf/samples/xdp_fwd_kern.c
>>>>>>
>>>>>>> But doesn't this way lose a chance to reduce/minimize the program to
>>>>>>> only use necessary features for this device?
>>>>>>
>>>>>> Not necessarily. Since the BPF program does the packet parsing and fills
>>>>>> in the TC filter lookup data structure, it can limit what features are
>>>>>> used that way (e.g., if I only want to do IPv6, I just parse the v6
>>>>>> header, ignore TCP/UDP, and drop everything that's not IPv6). The lookup
>>>>>> helper could also have a flag argument to disable some of the lookup
>>>>>> features.
>>>>>
>>>>> It's unclear to me how to configure that.
>>>>> Use options when attaching the program? Something like
>>>>> $ xdp_tc attach eth0 --only-with ipv6
>>>>> But can users always determine their necessary features in advance?
>>>>
>>>> That's what I'm doing with xdp-filter now. But the answer to your second
>>>> question is likely to be 'probably not', so it would be good to not have
>>>> to do this :)
>>>>
>>>>> Frequent manual reconfiguration when TC rules frequently changes does
>>>>> not sound nice. Or, add hook to kernel to listen any TC filter event
>>>>> on some daemon and automatically reload the attached program?
>>>>
>>>> Doesn't have to be a kernel hook; we could enhance the userspace tooling
>>>> to do it. Say we integrate it into 'tc':
>>>>
>>>> - Add a new command 'tc xdp_accel enable <iface> --features [ipv6,etc]'
>>>> - When adding new rules, add the following logic:
>>>> - Check if XDP acceleration is enabled
>>>> - If it is, check whether the rule being added fits into the current
>>>> 'feature set' loaded on that interface.
>>>> - If the rule needs more features, reload the XDP program to one
>>>> with the needed additional features.
>>>> - Or, alternatively, just warn the user and let them manually
>>>> replace it?
>>>
>>> Ok, but there are other userspace tools to configure tc in wild.
>>> python and golang have their own netlink library project.
>>> OVS embeds TC netlink handling code in itself. There may be more tools like this.
>>> I think at least we should have rtnl notification about TC and monitor it
>>> from daemon, if we want to reload the program from userspace tools.
>>
>> A daemon would be one way to do this in cases where it needs to be
>> completely dynamic. My guess is that there are lots of environments
>> where that is not required, and where a user/administrator could
>> realistically specify ahead of time which feature set they want to
>> enable XDP acceleration for. So in my mind the way to go about this is
>> to implement the latter first, then add dynamic reconfiguration of it on
>> top when (or if) it turns out to be necessary...
>
> Hmm, but I think there is big difference between a daemon and a cli tool.
> Shouldn't we determine the design considering future usage?
Sure, we should make sure the design doesn't exclude either option. But
we also shouldn't end up in a "the perfect is the enemy of the good"
type of situation. And the kernel-side changes are likely to be somewhat
independent of what the userspace management ends up looking like...
-Toke
Powered by blists - more mailing lists