lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 16 Feb 2018 14:40:19 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     netdev@...r.kernel.org
Cc:     netfilter-devel@...r.kernel.org, davem@...emloft.net,
        alexei.starovoitov@...il.com,
        Daniel Borkmann <daniel@...earbox.net>
Subject: [PATCH RFC 0/4] net: add bpfilter

This is a very rough and early proof of concept that implements bpfilter.
The basic idea of bpfilter is that it can process iptables queries and
translate them in user space into BPF programs which can then get attached
at various locations. For simplicity, in this RFC we demo attaching them
to XDP layer, but any other location would work as well (e.g. at the tc
sch_clsact ingress/egress location or any other/new hook with equivalent
semantics).

Also, as a benefit from such design, we get BPF JIT compilation on x86_64,
arm64, ppc64, sparc64, mips64, s390x and arm32, but also rule offloading
into HW for free for Netronome NFP SmartNICs that are already capable of
offloading BPF since we can reuse all existing BPF infrastructure as the
back end. The user space iptables binary issuing rule addition or dumps was
left as-is, thus at some point any binaries against iptables uapi kernel
interface could transparently be supported in such manner in long term.

As rule translation can potentially become very complex, this is performed
entirely in user space. In order to ease deployment, request_module() code
is extended to allow user mode helpers to be invoked. Idea is that user mode
helpers are built as part of the kernel build and installed as traditional
kernel modules with .ko file extension into distro specified location,
such that from a distribution point of view, they are no different than
regular kernel modules. Thus, allow request_module() logic to load such
user mode helper (umh) binaries via:

  request_module("foo") ->
    call_umh("modprobe foo") ->
      sys_finit_module(FD of /lib/modules/.../foo.ko) ->
        call_umh(struct file)

Such approach enables kernel to delegate functionality traditionally done
by kernel modules into user space processes (either root or !root) and
reduces security attack surface of such new code, meaning in case of
potential bugs only the umh would crash but not the kernel. Another
advantage coming with that would be that bpfilter.ko can be debugged and
tested out of user space as well (e.g. opening the possibility to run
all clang sanitizers, fuzzers or test suites for checking translation).
Also, such architecture makes the kernel/user boundary very precise,
meaning requests can be handled and BPF translated in control plane part
in user space with its own user memory etc, while minimal data plane
bits are in kernel. It would also allow to remove old xtables modules
at some point from the kernel while keeping functionality in place.

In the implemented proof of concept we show that simple /32 src/dst IPs
are translated in such manner. More complex rules would be added later
as well, also different BPF code generation backends that can be selected
for the various attachment points, proper encoder/decoder for the uapi
requests, etc. This just starts out very simple and basic for the sake
of an early RFC to demo the idea.

In the below example, we show that dumping, loading and offloading of
one or multiple simple rules work, we show the bpftool XDP dump of the
generated BPF instruction sequence as well as a simple functional ping
test to enforce policy in such way.

Set rebased on top of 255442c93843 ("Merge tag 'docs-4.16' of [...]").

Feedback very welcome!

Various bpfilter usage examples from the PoC code:

1) Dumping current rules:

  # iptables -t filter -L
  Chain INPUT (policy ACCEPT)
  target     prot opt source               destination

  Chain FORWARD (policy ACCEPT)
  target     prot opt source               destination

  Chain OUTPUT (policy ACCEPT)
  target     prot opt source               destination

2) ping test:

  # ping -c 1 127.0.0.1 -I 127.0.0.2
    PING 127.0.0.1 (127.0.0.1) from 127.0.0.2 : 56(84) bytes of data.
    64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.040 ms

    --- 127.0.0.1 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 0.040/0.040/0.040/0.000 ms

3) Adding & dumping a simple rule:

  # iptables -t filter -A INPUT -i lo -s 127.0.0.2/32 -d 127.0.0.1/32 -j DROP
  # iptables -t filter -L
  Chain INPUT (policy ACCEPT)
  target     prot opt source               destination
  DROP       all  --  127.0.0.2            localhost

  Chain FORWARD (policy ACCEPT)
  target     prot opt source               destination

  Chain OUTPUT (policy ACCEPT)
  target     prot opt source               destination

4) Dump BPF generated code for that rule (on lo it's XDP generic, otherwise
   native XDP for XDP supported drivers):

  # bpftool p
    18: xdp  tag 6b07f663830d5b0c
        loaded_at Feb 14/01:15  uid 0
        xlated 208B  not jited  memlock 4096B
  # bpftool p d x i 18
   0: (bf) r9 = r1
   1: (79) r2 = *(u64 *)(r9 +0)
   2: (79) r3 = *(u64 *)(r9 +8)
   3: (bf) r1 = r2
   4: (07) r1 += 14
   5: (bd) if r1 <= r3 goto pc+2
   6: (b4) (u32) r0 = (u32) 2
   7: (95) exit
   8: (bf) r1 = r2
   9: (b4) (u32) r5 = (u32) 0
  10: (69) r4 = *(u16 *)(r1 +12)
  11: (55) if r4 != 0x8 goto pc+9
  12: (07) r1 += 34
  13: (2d) if r1 > r3 goto pc+7
  14: (07) r1 += -20
  15: (61) r4 = *(u32 *)(r1 +12)
  16: (55) if r4 != 0x200007f goto pc+1
  17: (04) (u32) r5 += (u32) 1
  18: (61) r4 = *(u32 *)(r1 +16)
  19: (55) if r4 != 0x100007f goto pc+1
  20: (04) (u32) r5 += (u32) 1
  21: (55) if r5 != 0x2 goto pc+2
  22: (b4) (u32) r0 = (u32) 1
  23: (95) exit
  24: (b4) (u32) r0 = (u32) 2
  25: (95) exit

5) ping test:

  # ping -c 1 127.0.0.1 -I 127.0.0.2
    PING 127.0.0.1 (127.0.0.1) from 127.0.0.2 : 56(84) bytes of data.

    --- 127.0.0.1 ping statistics ---
    1 packets transmitted, 0 received, 100% packet loss, time 0ms

  # ping -c 1 127.0.0.1 -I 127.0.0.1
    PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 : 56(84) bytes of data.
    64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.018 ms

    --- 127.0.0.1 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 0.018/0.018/0.018/0.000 ms

  # ping -c 1 127.0.0.2 -I 127.0.0.2
    PING 127.0.0.2 (127.0.0.2) from 127.0.0.2 : 56(84) bytes of data.
    64 bytes from 127.0.0.2: icmp_seq=1 ttl=64 time=0.018 ms

    --- 127.0.0.2 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 0.018/0.018/0.018/0.000 ms

6) Adding & dumping a 2nd and 3rd rule:

  # iptables -t filter -A INPUT -i lo -s 127.0.0.4/32 -d 127.0.0.3/32 -j DROP
  # iptables -t filter -A INPUT -i lo -s 127.0.0.5/32 -j DROP
  # iptables -t filter -L
  Chain INPUT (policy ACCEPT)
  target     prot opt source               destination
  DROP       all  --  127.0.0.2            localhost
  DROP       all  --  127.0.0.4            127.0.0.3
  DROP       all  --  anywhere             127.0.0.5

  Chain FORWARD (policy ACCEPT)
  target     prot opt source               destination

  Chain OUTPUT (policy ACCEPT)
  target     prot opt source               destination

7) Dump BPF generated code again:

  # bpftool p
    20: xdp  tag 19519bdd253cbfe5
        loaded_at Feb 14/01:17  uid 0
        xlated 440B  not jited  memlock 4096B
  # bpftool p d x i 20
   0: (bf) r9 = r1
   1: (79) r2 = *(u64 *)(r9 +0)
   2: (79) r3 = *(u64 *)(r9 +8)
   3: (bf) r1 = r2
   4: (07) r1 += 14
   5: (bd) if r1 <= r3 goto pc+2
   6: (b4) (u32) r0 = (u32) 2
   7: (95) exit
   8: (bf) r1 = r2
   9: (b4) (u32) r5 = (u32) 0
  10: (69) r4 = *(u16 *)(r1 +12)
  11: (55) if r4 != 0x8 goto pc+9
  12: (07) r1 += 34
  13: (2d) if r1 > r3 goto pc+7
  14: (07) r1 += -20
  15: (61) r4 = *(u32 *)(r1 +12)
  16: (55) if r4 != 0x200007f goto pc+1
  17: (04) (u32) r5 += (u32) 1
  18: (61) r4 = *(u32 *)(r1 +16)
  19: (55) if r4 != 0x100007f goto pc+1
  20: (04) (u32) r5 += (u32) 1
  21: (55) if r5 != 0x2 goto pc+2
  22: (b4) (u32) r0 = (u32) 1
  23: (95) exit
  24: (bf) r1 = r2
  25: (b4) (u32) r5 = (u32) 0
  26: (69) r4 = *(u16 *)(r1 +12)
  27: (55) if r4 != 0x8 goto pc+9
  28: (07) r1 += 34
  29: (2d) if r1 > r3 goto pc+7
  30: (07) r1 += -20
  31: (61) r4 = *(u32 *)(r1 +12)
  32: (55) if r4 != 0x400007f goto pc+1
  33: (04) (u32) r5 += (u32) 1
  34: (61) r4 = *(u32 *)(r1 +16)
  35: (55) if r4 != 0x300007f goto pc+1
  36: (04) (u32) r5 += (u32) 1
  37: (55) if r5 != 0x2 goto pc+2
  38: (b4) (u32) r0 = (u32) 1
  39: (95) exit
  40: (bf) r1 = r2
  41: (b4) (u32) r5 = (u32) 0
  42: (69) r4 = *(u16 *)(r1 +12)
  43: (55) if r4 != 0x8 goto pc+6
  44: (07) r1 += 34
  45: (2d) if r1 > r3 goto pc+4
  46: (07) r1 += -20
  47: (61) r4 = *(u32 *)(r1 +12)
  48: (55) if r4 != 0x500007f goto pc+1
  49: (04) (u32) r5 += (u32) 1
  50: (55) if r5 != 0x1 goto pc+2
  51: (b4) (u32) r0 = (u32) 1
  52: (95) exit
  53: (b4) (u32) r0 = (u32) 2
  54: (95) exit

8) ping test again:

  # ping -c 1 127.0.0.4 -I 127.0.0.4
    PING 127.0.0.4 (127.0.0.4) from 127.0.0.4 : 56(84) bytes of data.
    64 bytes from 127.0.0.4: icmp_seq=1 ttl=64 time=0.032 ms

    --- 127.0.0.4 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 0.032/0.032/0.032/0.000 ms

  # ping -c 1 127.0.0.4 -I 127.0.0.3
    PING 127.0.0.4 (127.0.0.4) from 127.0.0.3 : 56(84) bytes of data.

    --- 127.0.0.4 ping statistics ---
    1 packets transmitted, 0 received, 100% packet loss, time 0ms

  # ping -c 1 127.0.0.1 -I 127.0.0.2
    PING 127.0.0.1 (127.0.0.1) from 127.0.0.2 : 56(84) bytes of data.

    --- 127.0.0.1 ping statistics ---
    1 packets transmitted, 0 received, 100% packet loss, time 0ms

  # ping -c 1 127.0.0.1 -I 127.0.0.5
    PING 127.0.0.1 (127.0.0.1) from 127.0.0.5 : 56(84) bytes of data.

    --- 127.0.0.1 ping statistics ---
    1 packets transmitted, 0 received, 100% packet loss, time 0ms

9) Now example test with offload into nfp device:

  # ethtool -i enp2s0
    driver: nfp
    version: 4.15.0+ SMP mod_unload
    firmware-version: 0.0.5.5 0.17 bpf_xxxxxxx ebpf
    expansion-rom-version:
    bus-info: 0000:02:00.0
    supports-statistics: yes
    supports-test: no
    supports-eeprom-access: no
    supports-register-dump: yes
    supports-priv-flags: no

  # iptables -t filter -A INPUT -i enp2s0 -s 192.168.2.2/32 -j DROP

  # bpftool p
  1: xdp  tag 88896d0ae0f463a6 dev enp2s0  ( <-- offloaded into HW )
        loaded_at Feb 15/14:30  uid 0
        xlated 184B  jited 640B  memlock 4096B
  # bpftool p d x i 1
   0: (bf) r9 = r1
   1: (79) r2 = *(u64 *)(r9 +0)
   2: (79) r3 = *(u64 *)(r9 +8)
   3: (bf) r1 = r2
   4: (07) r1 += 14
   5: (bd) if r1 <= r3 goto pc+2
   6: (b4) (u32) r0 = (u32) 2
   7: (95) exit
   8: (bf) r1 = r2
   9: (b4) (u32) r5 = (u32) 0
  10: (69) r4 = *(u16 *)(r1 +12)
  11: (55) if r4 != 0x8 goto pc+6
  12: (07) r1 += 34
  13: (2d) if r1 > r3 goto pc+4
  14: (07) r1 += -20
  15: (61) r4 = *(u32 *)(r1 +12)
  16: (55) if r4 != 0x202a8c0 goto pc+1
  17: (04) (u32) r5 += (u32) 1
  18: (55) if r5 != 0x1 goto pc+2
  19: (b4) (u32) r0 = (u32) 1
  20: (95) exit
  21: (b4) (u32) r0 = (u32) 2
  22: (95) exit

Thanks!

Alexei Starovoitov (2):
  modules: allow insmod load regular elf binaries
  bpf: introduce bpfilter commands

Daniel Borkmann (1):
  bpf: rough bpfilter codegen example hack

David S. Miller (1):
  net: initial bpfilter skeleton

 fs/exec.c                     |  40 ++++-
 include/linux/binfmts.h       |   1 +
 include/linux/bpfilter.h      |  13 ++
 include/linux/umh.h           |   4 +
 include/uapi/linux/bpf.h      |  31 ++++
 include/uapi/linux/bpfilter.h | 200 ++++++++++++++++++++++
 kernel/bpf/syscall.c          |  52 ++++++
 kernel/module.c               |  33 +++-
 kernel/umh.c                  |  24 ++-
 net/Kconfig                   |   2 +
 net/Makefile                  |   1 +
 net/bpfilter/Kconfig          |   7 +
 net/bpfilter/Makefile         |   9 +
 net/bpfilter/bpfilter.c       | 106 ++++++++++++
 net/bpfilter/bpfilter_mod.h   | 373 ++++++++++++++++++++++++++++++++++++++++++
 net/bpfilter/ctor.c           |  91 +++++++++++
 net/bpfilter/gen.c            | 290 ++++++++++++++++++++++++++++++++
 net/bpfilter/init.c           |  36 ++++
 net/bpfilter/sockopt.c        | 236 ++++++++++++++++++++++++++
 net/bpfilter/tables.c         |  73 +++++++++
 net/bpfilter/targets.c        |  51 ++++++
 net/bpfilter/tgts.c           |  26 +++
 net/ipv4/Makefile             |   2 +
 net/ipv4/bpfilter/Makefile    |   2 +
 net/ipv4/bpfilter/sockopt.c   |  64 ++++++++
 net/ipv4/ip_sockglue.c        |  17 ++
 26 files changed, 1767 insertions(+), 17 deletions(-)
 create mode 100644 include/linux/bpfilter.h
 create mode 100644 include/uapi/linux/bpfilter.h
 create mode 100644 net/bpfilter/Kconfig
 create mode 100644 net/bpfilter/Makefile
 create mode 100644 net/bpfilter/bpfilter.c
 create mode 100644 net/bpfilter/bpfilter_mod.h
 create mode 100644 net/bpfilter/ctor.c
 create mode 100644 net/bpfilter/gen.c
 create mode 100644 net/bpfilter/init.c
 create mode 100644 net/bpfilter/sockopt.c
 create mode 100644 net/bpfilter/tables.c
 create mode 100644 net/bpfilter/targets.c
 create mode 100644 net/bpfilter/tgts.c
 create mode 100644 net/ipv4/bpfilter/Makefile
 create mode 100644 net/ipv4/bpfilter/sockopt.c

-- 
2.9.5

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ