netdev - Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191002191522.GA9196@pc-66.home>
Date:   Wed, 2 Oct 2019 21:15:22 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Toke Høiland-Jørgensen <toke@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        Marek Majkowski <marek@...udflare.com>,
        Lorenz Bauer <lmb@...udflare.com>,
        David Miller <davem@...emloft.net>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        netdev@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single
 interface through chain calls

On Wed, Oct 02, 2019 at 09:43:49AM -0700, John Fastabend wrote:
> Toke Høiland-Jørgensen wrote:
> > This series adds support for executing multiple XDP programs on a single
> > interface in sequence, through the use of chain calls, as discussed at the Linux
> > Plumbers Conference last month:
> > 
> > https://linuxplumbersconf.org/event/4/contributions/460/
> > 
> > # HIGH-LEVEL IDEA
> > 
> > The basic idea is to express the chain call sequence through a special map type,
> > which contains a mapping from a (program, return code) tuple to another program
> > to run in next in the sequence. Userspace can populate this map to express
> > arbitrary call sequences, and update the sequence by updating or replacing the
> > map.
> > 
> > The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> > which will lookup the chain sequence map, and if found, will loop through calls
> > to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> > previous program ID and return code.
> > 
> > An XDP chain call map can be installed on an interface by means of a new netlink
> > attribute containing an fd pointing to a chain call map. This can be supplied
> > along with the XDP prog fd, so that a chain map is always installed together
> > with an XDP program.
> > 
> > # PERFORMANCE
> > 
> > I performed a simple performance test to get an initial feel for the overhead of
> > the chain call mechanism. This test consists of running only two programs in
> > sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
> > measure the drop PPS performance and compare it to a baseline of just a single
> > program that only returns XDP_DROP.
> > 
> > For comparison, a test case that uses regular eBPF tail calls to sequence two
> > programs together is also included. Finally, because 'perf' showed that the
> > hashmap lookup was the largest single source of overhead, I also added a test
> > case where I removed the jhash() call from the hashmap code, and just use the
> > u32 key directly as an index into the hash bucket structure.
> > 
> > The performance for these different cases is as follows (with retpolines disabled):
> 
> retpolines enabled would also be interesting.
> 
> > 
> > | Test case                       | Perf      | Add. overhead | Total overhead |
> > |---------------------------------+-----------+---------------+----------------|
> > | Before patch (XDP DROP program) | 31.0 Mpps |               |                |
> > | After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |
> 
> IMO even 1 Mpps overhead is too much for a feature that is primarily about
> ease of use. Sacrificing performance to make userland a bit easier is hard
> to justify for me when XDP _is_ singularly about performance. Also that is
> nearly 10% overhead which is fairly large. So I think going forward the
> performance gab needs to be removed.

Fully agree, for the case where this facility is not used, it must have
*zero* overhead. This is /one/ map flavor, in future there will be other
facilities with different use-cases, but we cannot place them all into
the critical fast-path. Given this is BPF, we have the flexibility that
this can be hidden behind the scenes by rewriting and therefore only add
overhead when used.

What I also see as a red flag with this proposal is the fact that it's
tied to XDP only because you need to go and hack bpf_prog_run_xdp() all
the way to fetch xdp->rxq->dev->xdp_chain_map even though the map/concept
itself is rather generic and could be used in various other program types
as well. I'm very sure that once there, people would request it. Therefore,
better to explore a way where this has no changes to BPF_PROG_RUN() similar
to the original tail call work.

Thanks,
Daniel