lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 16 Nov 2021 16:29:37 +0000
From:   "Loftus, Ciara" <ciara.loftus@...el.com>
To:     Jesper Dangaard Brouer <jbrouer@...hat.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>
CC:     "brouer@...hat.com" <brouer@...hat.com>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "hawk@...nel.org" <hawk@...nel.org>,
        "john.fastabend@...il.com" <john.fastabend@...il.com>,
        "toke@...hat.com" <toke@...hat.com>,
        "bjorn@...nel.org" <bjorn@...nel.org>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>,
        "Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>
Subject: RE: [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx

> 
> 
> 
> On 16/11/2021 08.37, Ciara Loftus wrote:
> > The common case for AF_XDP sockets (xsks) is creating a single xsk on a
> queue for sending and
> > receiving frames as this is analogous to HW packet steering through RSS
> and other classification
> > methods in the NIC. AF_XDP uses the xdp redirect infrastructure to direct
> packets to the socket. It
> > was designed for the much more complicated case of DEVMAP
> xdp_redirects which directs traffic to
> > another netdev and thus potentially another driver. In the xsk redirect
> case, by skipping the
> > unnecessary parts of this common code we can significantly improve
> performance and pave the way
> > for batching in the driver. This RFC proposes one such way to simplify the
> infrastructure which
> > yields a 27% increase in throughput and a decrease in cycles per packet of
> 24 cycles [1]. The goal
> > of this RFC is to start a discussion on how best to simplify the single-socket
> datapath while
> > providing one method as an example.
> >
> > Current approach:
> > 1. XSK pointer: an xsk is created and a handle to the xsk is stored in the
> XSKMAP.
> > 2. XDP program: bpf_redirect_map helper triggers the XSKMAP lookup
> which stores the result (handle
> > to the xsk) and the map type (XSKMAP) in the percpu bpf_redirect_info
> struct. The XDP_REDIRECT
> > action is returned.
> > 3. XDP_REDIRECT handling called by the driver: the map type (XSKMAP) is
> read from the
> > bpf_redirect_info which selects the xsk_map_redirect path. The xsk
> pointer is retrieved from the
> > bpf_redirect_info and the XDP descriptor is pushed to the xsk's Rx ring. The
> socket is added to a
> > list for flushing later.
> > 4. xdp_do_flush: iterate through the lists of all maps that can be used for
> redirect (CPUMAP,
> > DEVMAP and XSKMAP). When XSKMAP is flushed, go through all xsks that
> had any traffic redirected to
> > them and bump the Rx ring head pointer(s).
> >
> > For the end goal of submitting the descriptor to the Rx ring and bumping
> the head pointer of that
> > ring, only some of these steps are needed. The rest is overhead. The
> bpf_redirect_map
> > infrastructure is needed for all other redirect operations, but is not
> necessary when redirecting
> > to a single AF_XDP socket. And similarly, flushing the list for every map type
> in step 4 is not
> > necessary when only one socket needs to be flushed.
> >
> > Proposed approach:
> > 1. XSK pointer: an xsk is created and a handle to the xsk is stored both in
> the XSKMAP and also the
> > netdev_rx_queue struct.
> > 2. XDP program: new bpf_redirect_xsk helper returns XDP_REDIRECT_XSK.
> > 3. XDP_REDIRECT_XSK handling called by the driver: the xsk pointer is
> retrieved from the
> > netdev_rx_queue struct and the XDP descriptor is pushed to the xsk's Rx
> ring.
> > 4. xsk_flush: fetch the handle from the netdev_rx_queue and flush the
> xsk.
> >
> > This fast path is triggered on XDP_REDIRECT_XSK if:
> >    (i) AF_XDP socket SW Rx ring configured
> >   (ii) Exactly one xsk attached to the queue
> > If any of these conditions are not met, fall back to the same behavior as the
> original approach:
> > xdp_redirect_map. This is handled under-the-hood in the new
> bpf_xdp_redirect_xsk helper so the user
> > does not need to be aware of these conditions.
> >
> > Batching:
> > With this new approach it is possible to optimize the driver by submitting a
> batch of descriptors
> > to the Rx ring in Step 3 of the new approach by simply verifying that the
> action returned from
> > every program run of each packet in a batch equals XDP_REDIRECT_XSK.
> That's because with this
> > action we know the socket to redirect to will be the same for each packet in
> the batch. This is
> > not possible with XDP_REDIRECT because the xsk pointer is stored in the
> bpf_redirect_info and not
> > guaranteed to be the same for every packet in a batch.
> >
> > [1] Performance:
> > The benchmarks were performed on VM running a 2.4GHz Ice Lake host
> with an i40e device passed
> > through. The xdpsock app was run on a single core with busy polling and
> configured in 'rxonly' mode.
> > ./xdpsock -i <iface> -r -B
> > The improvement in throughput when using the new bpf helper and XDP
> action was measured at ~13% for
> > scalar processing, with reduction in cycles per packet of ~13. A further ~14%
> improvement in
> > throughput and reduction of ~11 cycles per packet was measured when
> the batched i40e driver path
> > was used, for a total improvement of ~27% in throughput and reduction of
> ~24 cycles per packet.
> >
> > Other approaches considered:
> > Two other approaches were considered. The advantage of both being that
> neither involved introducing
> > a new XDP action. The first alternative approach considered was to create a
> new map type
> > BPF_MAP_TYPE_XSKMAP_DIRECT. When the XDP_REDIRECT action was
> returned, this map type could be
> > checked and used as an indicator to skip the map lookup and use the
> netdev_rx_queue xsk instead.
> > The second approach considered was similar and involved using a new
> bpf_redirect_info flag which
> > could be used in a similar fashion.
> > While both approaches yielded a performance improvement they were
> measured at about half of what
> > was measured for the approach outlined in this RFC. It seems using
> bpf_redirect_info is too
> > expensive.
> 
> I think it was Bjørn that discovered that accessing the per CPU
> bpf_redirect_info struct have an overhead of approx 2 ns (times 2.4GHz
> ~4.8 cycles).  Your reduction in cycles per packet was ~13, where ~4.8
> seem to be large.
> 
> The code access this_cpu_ptr(&bpf_redirect_info) two times.
> One time in the BPF-helper redirect call and second in xdp_do_redirect.
> (Hint xdp_redirect_map end-up calling __bpf_xdp_redirect_map)
> 
> Thus, it seems (as you say), the bpf_redirect_info approach is too
> expensive.  May be should look at storing bpf_redirect_info in a place
> that doesn't requires the this_cpu_ptr() lookup... or cache the lookup
> per NAPI cycle.
> Have you tried this?
> 
> 
> > Also, centralised processing of XDP actions was investigated. This would
> involve porting all drivers
> > to a common interface for handling XDP actions which would greatly
> simplify the work involved in
> > adding support for new XDP actions such as XDP_REDIRECT_XSK. However
> it was deemed at this point to
> > be more complex than adding support for the new action to every driver.
> Should this series be
> > considered worth pursuing for a proper patch set, the intention would be
> to update each driver
> > individually.
> 
> I'm fine with adding a new helper, but I don't like introducing a new
> XDP_REDIRECT_XSK action, which requires updating ALL the drivers.
> 
> With XDP_REDIRECT infra we beleived we didn't need to add more
> XDP-action code to drivers, as we multiplex/add new features by
> extending the bpf_redirect_info.
> In this extreme performance case, it seems the this_cpu_ptr "lookup" of
> bpf_redirect_info is the performance issue itself.
> 
> Could you experiement with different approaches that modify
> xdp_do_redirect() to handle if new helper bpf_redirect_xsk was called,
> prior to this_cpu_ptr() call.
> (Thus, avoiding to introduce a new XDP-action).

Thanks for your feedback Jesper.
I understand the hesitation of adding a new action. If we can achieve the same improvement without
introducing a new action I would be very happy!
Without new the action we'll need a new way to indicate that the bpf_redirect_xsk helper was
called. Maybe another new field in the netdev alongside the xsk_refcnt. Or else extend
bpf_redirect_info - if we find a new home for it that it's too costly to access.
Thanks for your suggestions. I'll experiment as you suggested and report back.

Thanks,
Ciara

> 
> > Thank you to Magnus Karlsson and Maciej Fijalkowski for several
> suggestions and insight provided.
> >
> > TODO:
> > * Add selftest(s)
> > * Add support for all copy and zero copy drivers
> > * Libxdp support
> >
> > The series applies on commit e5043894b21f ("bpftool: Use
> libbpf_get_error() to check error")
> >
> > Thanks,
> > Ciara
> >
> > Ciara Loftus (8):
> >    xsk: add struct xdp_sock to netdev_rx_queue
> >    bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action
> >    xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush
> >    i40e: handle the XDP_REDIRECT_XSK action
> >    xsk: implement a batched version of xsk_rcv
> >    i40e: isolate descriptor processing in separate function
> >    i40e: introduce batched XDP rx descriptor processing
> >    libbpf: use bpf_redirect_xsk in the default program
> >
> >   drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  13 +-
> >   .../ethernet/intel/i40e/i40e_txrx_common.h    |   1 +
> >   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 285 +++++++++++++++---
> >   include/linux/netdevice.h                     |   2 +
> >   include/net/xdp_sock_drv.h                    |  49 +++
> >   include/net/xsk_buff_pool.h                   |  22 ++
> >   include/uapi/linux/bpf.h                      |  13 +
> >   kernel/bpf/verifier.c                         |   7 +-
> >   net/core/dev.c                                |  14 +
> >   net/core/filter.c                             |  26 ++
> >   net/xdp/xsk.c                                 |  69 ++++-
> >   net/xdp/xsk_queue.h                           |  31 ++
> >   tools/include/uapi/linux/bpf.h                |  13 +
> >   tools/lib/bpf/xsk.c                           |  50 ++-
> >   14 files changed, 551 insertions(+), 44 deletions(-)
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ