netdev - Re: [PATCH RFC PoC 0/3] nftables meets bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <68f7d130-c1af-8dbe-68b0-6b4ff29ba41a@iogearbox.net>
Date:   Tue, 20 Feb 2018 16:03:08 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Pablo Neira Ayuso <pablo@...filter.org>
Cc:     netfilter-devel@...r.kernel.org, davem@...emloft.net,
        netdev@...r.kernel.org, laforge@...filter.org, fw@...len.de,
        alexei.starovoitov@...il.com
Subject: Re: [PATCH RFC PoC 0/3] nftables meets bpf

Hi Pablo,

On 02/20/2018 11:58 AM, Pablo Neira Ayuso wrote:
> On Mon, Feb 19, 2018 at 08:57:39PM +0100, Daniel Borkmann wrote:
>> On 02/19/2018 05:37 PM, Pablo Neira Ayuso wrote:
>> [...]
>>> * Simplified infrastructure: We don't need the ebpf verifier complexity
>>>   either given we trust the code we generate from the kernel. We don't
>>>   need any complex userspace tooling either, just libnftnl and nft
>>>   userspace binaries.
>>>
>>> * Hardware offload: We can use this to offload rulesets to the only
>>>   smartnic driver that we have in the tree that already implements bpf
>>>   offload, hence, we can reuse this work already in place.
>>
>> In addition Dave's points, regarding the above two, this will also only
>> work behind the verifier since NIC offloading piggy-backs on the verifier's
>> program analysis to prepare and generate a dev specific JITed BPF
>> prog, so it's not the same as normal host JITs (and there, the cBPF ->
>> eBPF in kernel migration adds a lot of headaches already due to
>> different underlying assumptions coming from the two flavors, even
>> if both are eBPF insns in the end), and given this, offloading will
>> also only work for eBPF and not cBPF.
> 
> We also have a large range of TCAM based hardware offload outthere
> that will _not_ work with your BPF HW offload infrastructure. What
> this bpf infrastructure pushes into the kernel is just a blob
> expressing things in a very low-level instruction-set: trying to find
> a mapping of that to typical HW intermediate representations in the
> TCAM based HW offload world will be simply crazy.

Sure, and I think that's fine; there have been possible ways proposed
in last netdev conference how this can be addressed by adding hints [0]
in a programmable way as meta data in front of the packet as one option
to accelerate. Other than that for fully pushing into hardware people will
get a SmartNIC and there are multiple big vendors in that area working
on them. Potentially in few years from now they're more and more becoming
a commodity in DCs, lets see. Maybe we'll be programming them similarly
as the case with graphics cards today. :-)

  [0] https://www.netdevconf.org/2.2/session.html?waskiewicz-xdpacceleration-talk

>> There's a lot more the verifier is doing internally, like performing
>> various different program rewrites from the context, for helpers
>> (e.g. inlining), and for internal insn mappings that are not exposed
>> (e.g. in calls), so we definitely need to go through it.
> 
> If we need to call the verifier from the kernel for the code that we
> generate there for this initial stage, that should be not an issue.
> 
> The BPF interface is lacking many of the features and flexibility we
> have in netlink these days, and it is only allowing for monolitic
> ruleset replacement. This approach also loses internal rule stateful

That only depends how you partition your program, a partial reconfiguration
is definitely possible and done so today, for example as talked about in
LB use case where the packet processing is staged e.g. into sampling,
DDoS mitigation, and encap + redirect phase, where each of the components
can be replaced atomically during runtime. So there is definitely flexibility
available.

Thanks,
Daniel

> information that we're doing in the packet path when updating the
> ruleset. So it's taking us back to exactly the same mistakes we made
> in iptables back in the 90s as it's been mentioned already.
> 
> So I just wish I can count with your help in this process, we can get
> the best of the two worlds by providing a subsystem that allows users
> to configure packet classification through one single interface, no
> matter if the policy representation ends up being in software or HW
> offloads, either TCAM or smartnic.
> 
> Thanks.
>