netdev - Re: [RFC,POC] iptables/nftables to epbf/xdp via common intermediate layer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 6 Mar 2018 17:42:29 +0100
From:   Florian Westphal <fw@...len.de>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Florian Westphal <fw@...len.de>, netdev@...r.kernel.org,
        ast@...nel.org, pablo@...filter.org
Subject: Re: [RFC,POC] iptables/nftables to epbf/xdp via common intermediate
 layer

Daniel Borkmann <daniel@...earbox.net> wrote:
> On 03/04/2018 08:40 PM, Florian Westphal wrote:
> > Its still in early stage, but I think its good enough as
> > a proof-of-concept.
> 
> be the critical part in that it needs to be flexible enough to cover
> both front ends well enough without having to make compromises to
> one of them.

Yes, thats the idea.

> The same would be for optimization passes e.g. when we
> know that two successive rules would match on TCP header bits that we
> can reuse the register that loaded/parsed it previously to that point.

Yes, I already thought about this.

First step would be to simply track the last accessed header base
(mac, nh, th) in the imr state so we can reuse the offset if possible.

> Similar when it comes to maps when the lookup value would need to
> propagate through the linked imr objects. Do you see such optimizations
> or in general propagation of state as direct part of the IMR or rather
> somewhat hidden in IMR layer when doing the IMR to BPF 'jit' phase?

Direct part.  I would only say that it should be hidden in the sense that
frontends don't need to do anything here.
I would do such propagation in the IMR before generating code.

The IMR has enough info to figure out:
1. number of (ebpf) registers that are needed simultaneously at most
2. how many payload requests relative to (mac/network/tcp header) there are
(both in total and in relation to rule count).

So it could e.g. decide to set aside 2 registers to contain the
th/nh offset to have them ready at all times, while not covering mac
header at all because its not used anywhere.

Likewise, we could even avoid for instance extension header parsing
as IMR would see that there are no 'IPv6+transport header' rules.

> Which other parts do you think would be needed for the IMR aside
> from above basic operations? ALU ops, JMPs for the relational ops?

We'll need at least following IMR features/objects (pseudo instructions):

- relops: gt, ge, lt etc.
- all bitops (shift, xor, and, etc)
- payload: allow stores instead of loads (this includes csum fixup)
- verdict: jumps to other chains, nfqueue verdict
- meta: (accesses to in/output interface names, skb length, and the like).
- register allocation for ipv6 addresses, interface names, etc (anything
larger than 64 bit).

I would also add 'highlevel' objects that are themselves translated into
basic operations.  Most obvious example
are 'fetch 4 bytes x bytes into transport header'.

Frontend should not need to bother with ipv4 details, such as ip
options.  Instead IMR should take care of this and generate the needed
instructions to fetch iph->ihl and figure out the correct transport
header offset.

It could do so by adding the needed part of the rule (fetch byte containing
ihl, mask/shift result, store the offset, etc.) internally.

I would also add:
- limit (to jit both xt_limit and nft limit)
- statistic (random sampling)

as both nft and iptables have those features and i think they're
pretty much the same so it can be handled by common code.
Same for others extensions but those have side effects and I don't
know yet how to handle that (log, counter and the like).

Same for set lookups.

In iptables case, they will only be 'address/network/interface is (not) in set',
i.e. only match/no match, and set manipulations (SET target).

In nftables case it can also be something like 'retrieve value
belonging to key (dreg = lookup(map, sreg, sreg_len), and then dreg
value gets used in other ways.

I think that by making the IMR also handle the nftables case rather than
iptables one has the advantage that the IMR might be able to optimize
repetitiive iptables rules (and nft rules that do not use jump map even
if they could).

iptables .. -i eth0 -j ETH0_RULES
iptables .. -i eth1 -j ETH1_RULES
..
iptables .. -i ethX -j ETHX_RULES

could be transparently thrown into a map and the jump target retrieved
from it base on iifname...

The IMR would also need to be able to create/allocate such sets/maps
on its own.

Do you see this as something that has to be decided now?
Ideally feeling about IMR would be that we can probably handle this
later by extending it rather than having to throw it away plus a
rewrite :)

> I think it would be good to have clear semantics in terms of what
> it would eventually abstract away from raw BPF when sitting on top
> of it; potentially these could be details on packet access or

[..]

Yes, it would be good if IMR could e.g. do merge of adjacent loads:

load 4 bytes from network header at offset 12  (relop, equal) imm 0x1
load 4 bytes from network header at offset 16  (relop, equal) imm 0x2

 ..  would be condensed to ...

load 8 bytes from network header at offset 12  (relop, equal) imm 0x100000002

... before generating ebpf code.

I don't think this should be part of frontend (too lowlevel task for
frontend) and I don't see why it should happen when generating code (too high
level task for that).

> interaction with helpers or other BPF features such that it's BPF prog
> type independent at this stage, e.g. another point could be that given
> the priv/cap level of the uapi request, there could also be different
> BPF code gen backends that implement against the IMR, e.g. when the
> request comes out of userns then it has feature constraints in terms
> of e.g. having to use bpf_skb_{load,store}_bytes() helpers for packet
> access instead of direct packet access or not being able to use BPF to
> BPF calls, etc; wdyt?

I haven't thought about this at all, I think this is useful.

When initializing IMR state we could provide feature flag(s) that could
either instantly reject something (e.g. context only allows ro-access to
packets but IMR was just told to mangle payload), or we could consider
it during code-gen phase (i don't understand ebpf well enough to say
wheter thats useful or not, but if its then it would be good if IMR could
do it.

I did not intend the IMR as a 'public' library -- it would only be for
use by in-tree umh translators.

So anything that needs changing at later stage should be doable,
but obviously the less rework it needs later on the better...

Thanks for reviewing!