[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191125105337.GA14828@pc-9.home>
Date: Mon, 25 Nov 2019 11:53:37 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Björn Töpel <bjorn.topel@...il.com>
Cc: netdev@...r.kernel.org, ast@...nel.org,
Björn Töpel <bjorn.topel@...el.com>,
bpf@...r.kernel.org, magnus.karlsson@...il.com,
magnus.karlsson@...el.com, jonathan.lemon@...il.com,
ecree@...arflare.com, thoiland@...hat.com,
andrii.nakryiko@...il.com, tariqt@...lanox.com,
saeedm@...lanox.com, maximmi@...lanox.com
Subject: Re: [PATCH bpf-next v2 1/6] bpf: introduce BPF dispatcher
On Sat, Nov 23, 2019 at 08:12:20AM +0100, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@...el.com>
>
> The BPF dispatcher is a multiway branch code generator, mainly
> targeted for XDP programs. When an XDP program is executed via the
> bpf_prog_run_xdp(), it is invoked via an indirect call. With
> retpolines enabled, the indirect call has a substantial performance
> impact. The dispatcher is a mechanism that transform multiple indirect
> calls to direct calls, and therefore avoids the retpoline. The
> dispatcher is generated using the BPF JIT, and relies on text poking
> provided by bpf_arch_text_poke().
>
> The dispatcher hijacks a trampoline function it via the __fentry__ nop
> of the trampoline. One dispatcher instance currently supports up to 16
> dispatch points. This can be extended in the future.
>
> An example: A module/driver allocates a dispatcher. The dispatcher is
> shared for all netdevs. Each unique XDP program has a slot in the
> dispatcher, registered by a netdev. The netdev then uses the
> dispatcher to call the correct program with a direct call.
>
> Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
[...]
> +static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
> +{
> + u8 *jg_reloc, *jg_target, *prog = *pprog;
> + int pivot, err, jg_bytes = 1, cnt = 0;
> + s64 jg_offset;
> +
> + if (a == b) {
> + /* Leaf node of recursion, i.e. not a range of indices
> + * anymore.
> + */
> + EMIT1(add_1mod(0x48, BPF_REG_3)); /* cmp rdx,func */
> + if (!is_simm32(progs[a]))
> + return -1;
> + EMIT2_off32(0x81, add_1reg(0xF8, BPF_REG_3),
> + progs[a]);
> + err = emit_cond_near_jump(&prog, /* je func */
> + (void *)progs[a], prog,
> + X86_JE);
> + if (err)
> + return err;
> +
> + err = emit_jump(&prog, /* jmp thunk */
> + __x86_indirect_thunk_rdx, prog);
> + if (err)
> + return err;
> +
> + *pprog = prog;
> + return 0;
> + }
> +
> + /* Not a leaf node, so we pivot, and recursively descend into
> + * the lower and upper ranges.
> + */
> + pivot = (b - a) / 2;
> + EMIT1(add_1mod(0x48, BPF_REG_3)); /* cmp rdx,func */
> + if (!is_simm32(progs[a + pivot]))
> + return -1;
> + EMIT2_off32(0x81, add_1reg(0xF8, BPF_REG_3), progs[a + pivot]);
> +
> + if (pivot > 2) { /* jg upper_part */
> + /* Require near jump. */
> + jg_bytes = 4;
> + EMIT2_off32(0x0F, X86_JG + 0x10, 0);
> + } else {
> + EMIT2(X86_JG, 0);
> + }
> + jg_reloc = prog;
> +
> + err = emit_bpf_dispatcher(&prog, a, a + pivot, /* emit lower_part */
> + progs);
> + if (err)
> + return err;
> +
> + /* Intel 64 and IA-32 ArchitecturesOptimization Reference
> + * Manual, 3.4.1.5 Code Alignment Assembly/Compiler Coding
> + * Rule 12. (M impact, H generality) All branch targets should
> + * be 16-byte aligned.
Isn't this section 3.4.1.4, rule 11 or are you reading a newer manual
than on the website [0]? :) Just wondering, in your IXIA tests, did you
see any noticeable slowdowns if you don't do the 16-byte alignments as
in the rest of the kernel [1,2]?
[0] https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf
[1] be6cb02779ca ("x86: Align jump targets to 1-byte boundaries")
[2] https://lore.kernel.org/patchwork/patch/560050/
> + */
> + jg_target = PTR_ALIGN(prog, 16);
> + if (jg_target != prog)
> + emit_nops(&prog, jg_target - prog);
> + jg_offset = prog - jg_reloc;
> + emit_code(jg_reloc - jg_bytes, jg_offset, jg_bytes);
> +
> + err = emit_bpf_dispatcher(&prog, a + pivot + 1, /* emit upper_part */
> + b, progs);
> + if (err)
> + return err;
> +
> + *pprog = prog;
> + return 0;
> +}
Powered by blists - more mailing lists