lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20191113204737.31623-1-bjorn.topel@gmail.com>
Date:   Wed, 13 Nov 2019 21:47:33 +0100
From:   Björn Töpel <bjorn.topel@...il.com>
To:     netdev@...r.kernel.org, ast@...nel.org, daniel@...earbox.net
Cc:     Björn Töpel <bjorn.topel@...il.com>,
        bpf@...r.kernel.org, magnus.karlsson@...il.com,
        magnus.karlsson@...el.com, jonathan.lemon@...il.com
Subject: [RFC PATCH bpf-next 0/4] Introduce xdp_call.h and the BPF dispatcher

This RFC(!) introduces the BPF dispatcher and xdp_call.h, and it's a
mechanism to avoid the retpoline overhead by text-poking/rewriting
indirect calls to direct calls.

The ideas build on Alexei's V3 of the BPF trampoline work, namely:
  * Use the existing BPF JIT infrastructure generate code
  * Use bpf_arch_text_poke() to modify the kernel text  

To try the series out, you'll need V3 of the BPF trampoline work [1].

The main idea; Each XDP call-site calls the jited dispatch table,
instead of an indirect call. The dispatch table calls the XDP programs
directly. In pseudo code this be something similar to:

unsigned int do_call(struct bpf_prog *prog, struct xdp_buff *xdp)
{
	if (&prog == PROG1)
		return call_direct_PROG1(xdp);
	if (&prog == PROG2)
		return call_direct_PROG2(xdp);
	return indirect_call(prog, xdp);
}

The current dispatcher supports four entries. It could support more,
but I don't know if it's really practical (...and I was lazy -- more
than 4 entries meant moving to >1B Jcc. :-P). The dispatcher is
re-generated for each new XDP program/entry. The upper limit of four
in this series means that if six i40e netdevs have an XDP program
running, the fifth and sixth will be using an indirect call.

Now to the performance numbers. I ran this on my 3 GHz Skylake, 64B
UDP packets are sent to the i40e at ~40 Mpps.

Benchmark:
  # ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP

  1. Baseline:            26.0 Mpps
  2. Dispatcher 1 entry:  35,5 Mpps (+36.5%)
  3. Dispatcher 4 enties: 32.9 Mpps (+26.5%)
  4. Dispatcher 5 enties: 24.2 Mpps (-6.9%)

Scenario 4 is that the benchmark uses the dispatcher, but the table is
full. This means that the caller pays for the dispatching *and* the
retpoline.

Is this a good idea? The performance is nice! Can it be done in a
better way? Useful for other BPF programs? I would love your input!


Thanks!
Björn

[1] https://patchwork.ozlabs.org/cover/1191672/

Björn Töpel (4):
  bpf: teach bpf_arch_text_poke() jumps
  bpf: introduce BPF dispatcher
  xdp: introduce xdp_call
  i40e: start using xdp_call.h

 arch/x86/net/bpf_jit_comp.c                 | 130 ++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_main.c |   5 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |   5 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c  |   5 +-
 include/linux/bpf.h                         |   3 +
 include/linux/xdp_call.h                    |  49 +++++
 kernel/bpf/Makefile                         |   1 +
 kernel/bpf/dispatcher.c                     | 197 ++++++++++++++++++++
 8 files changed, 388 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/xdp_call.h
 create mode 100644 kernel/bpf/dispatcher.c

-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ