[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5d94d3c5a238f_22502b00ea21a5b4e9@john-XPS-13-9370.notmuch>
Date: Wed, 02 Oct 2019 09:43:49 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>,
Daniel Borkmann <daniel@...earbox.net>
Cc: Alexei Starovoitov <ast@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
Marek Majkowski <marek@...udflare.com>,
Lorenz Bauer <lmb@...udflare.com>,
David Miller <davem@...emloft.net>,
Jesper Dangaard Brouer <brouer@...hat.com>,
netdev@...r.kernel.org, bpf@...r.kernel.org
Subject: RE: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single
interface through chain calls
Toke Høiland-Jørgensen wrote:
> This series adds support for executing multiple XDP programs on a single
> interface in sequence, through the use of chain calls, as discussed at the Linux
> Plumbers Conference last month:
>
> https://linuxplumbersconf.org/event/4/contributions/460/
>
> # HIGH-LEVEL IDEA
>
> The basic idea is to express the chain call sequence through a special map type,
> which contains a mapping from a (program, return code) tuple to another program
> to run in next in the sequence. Userspace can populate this map to express
> arbitrary call sequences, and update the sequence by updating or replacing the
> map.
>
> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> which will lookup the chain sequence map, and if found, will loop through calls
> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> previous program ID and return code.
>
> An XDP chain call map can be installed on an interface by means of a new netlink
> attribute containing an fd pointing to a chain call map. This can be supplied
> along with the XDP prog fd, so that a chain map is always installed together
> with an XDP program.
>
> # PERFORMANCE
>
> I performed a simple performance test to get an initial feel for the overhead of
> the chain call mechanism. This test consists of running only two programs in
> sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
> measure the drop PPS performance and compare it to a baseline of just a single
> program that only returns XDP_DROP.
>
> For comparison, a test case that uses regular eBPF tail calls to sequence two
> programs together is also included. Finally, because 'perf' showed that the
> hashmap lookup was the largest single source of overhead, I also added a test
> case where I removed the jhash() call from the hashmap code, and just use the
> u32 key directly as an index into the hash bucket structure.
>
> The performance for these different cases is as follows (with retpolines disabled):
retpolines enabled would also be interesting.
>
> | Test case | Perf | Add. overhead | Total overhead |
> |---------------------------------+-----------+---------------+----------------|
> | Before patch (XDP DROP program) | 31.0 Mpps | | |
> | After patch (XDP DROP program) | 28.9 Mpps | 2.3 ns | 2.3 ns |
IMO even 1 Mpps overhead is too much for a feature that is primarily about
ease of use. Sacrificing performance to make userland a bit easier is hard
to justify for me when XDP _is_ singularly about performance. Also that is
nearly 10% overhead which is fairly large. So I think going forward the
performance gab needs to be removed.
> | XDP tail call | 26.6 Mpps | 3.0 ns | 5.3 ns |
> | XDP chain call (no jhash) | 19.6 Mpps | 13.4 ns | 18.7 ns |
> | XDP chain call (this series) | 17.0 Mpps | 7.9 ns | 26.6 ns |
>
> From this it is clear that while there is some overhead from this mechanism; but
> the jhash removal example indicates that it is probably possible to optimise the
> code to the point where the overhead becomes low enough that it is acceptable.
I'm missing why 'in theory' at least this can't be made as-fast as tail calls?
Again I can't see why someone would lose 30% of their performance when a userland
program could populate a tail call map for the same effect. Sure userland would
also have to enforce some program standards/conventions but it could be done and
at 30% overhead that pain is probably worth it IMO.
My thinking though is if we are a bit clever chaining and tail calls could be
performance-wise equivalent?
I'll go read the patches now ;)
.John
>
> # PATCH SET STRUCTURE
> This series is structured as follows:
>
> - Patch 1: Prerequisite
> - Patch 2: New map type
> - Patch 3: Netlink hooks to install the chain call map
> - Patch 4: Core chain call logic
> - Patch 5-7: Bookkeeping updates to tools
> - Patch 8: Libbpf support for installing chain call maps
> - Patch 9: Selftests with example user space code
>
> The whole series is also available in my git repo on kernel.org:
> https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=xdp-multiprog-01
>
Powered by blists - more mailing lists