lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Feb 2019 12:55:30 +0100
From:   Jesper Dangaard Brouer <>
To:     Magnus Karlsson <>
Cc:     Jonathan Lemon <>,
        Magnus Karlsson <>,
        Björn Töpel <>,, Daniel Borkmann <>,
        Network Development <>,
        Jakub Kicinski <>,
        Björn Töpel <>,
        "Zhang, Qi Z" <>,,,
        "" <>
Subject: Re: [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support

On Wed, 13 Feb 2019 12:32:47 +0100
Magnus Karlsson <> wrote:

> On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon <> wrote:
> >
> > On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
> >  
> > > This patch proposes to add AF_XDP support to libbpf. The main reason
> > > for this is to facilitate writing applications that use AF_XDP by
> > > offering higher-level APIs that hide many of the details of the AF_XDP
> > > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > > offering easy-to-use higher level interfaces of XDP
> > > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > > applications using it simpler and smaller, and finally also make it
> > > possible for applications to benefit from optimizations in the AF_XDP
> > > user space access code. Previously, people just copied and pasted the
> > > code from the sample application into their application, which is not
> > > desirable.  
> >
> > I like the idea of encapsulating the boilerplate logic in a library.
> >
> > I do think there is an important missing piece though - there should be
> > some code which queries the netdev for how many queues are attached, and
> > create the appropriate number of umem/AF_XDP sockets.
> >
> > I ran into this issue when testing the current AF_XDP code - on my test
> > boxes, the mlx5 card has 55 channels (aka queues), so when the test program
> > binds only to channel 0, nothing works as expected, since not all traffic
> > is being intercepted.  While obvious in hindsight, this took a while to
> > track down.  
> Yes, agreed. You are not the first one to stumble upon this problem
> :-). Let me think a little bit on how to solve this in a good way. We
> need this to be simple and intuitive, as you say.

I see people hitting this with AF_XDP all the time... I had some
backup-slides[2] in our FOSDEM presentation[1] that describe the issue,
give the performance reason why and propose a workaround.


Alternative work-around
  * Create as many AF_XDP sockets as RXQs
  * Have userspace poll()/select on all sockets

Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat

* Backup Slides                                                      :export:

** Slide: Where does AF_XDP performance come from?                  :export:

/Lock-free [[][channel]] directly from driver RX-queue into AF_XDP socket/
- Single-Producer/Single-Consumer (SPSC) descriptor ring queues
- *Single*-/Producer/ (SP) via bind to specific RX-*/queue id/*
  * NAPI-softirq assures only 1-CPU process 1-RX-queue id (per sched)
- *Single*-/Consumer/ (SC) via 1-Application
- *Bounded* buffer pool (UMEM) allocated by userspace (register with kernel)
  * Descriptor(s) in ring(s) point into UMEM
  * /No memory allocation/, but return frames to UMEM in timely manner
- [[][Transport signature]] Van Jacobson talked about
  * Replaced by XDP/eBPF program choosing to XDP_REDIRECT

** Slide: Details: Actually *four* SPSC ring queues                 :export:

AF_XDP /socket/: Has /two rings/: *RX* and *TX*
 - Descriptor(s) in ring points into UMEM
/UMEM/ consists of a number of equally sized chunks
 - Has /two rings/: *FILL* ring and *COMPLETION* ring
 - FILL ring: application gives kernel area to RX fill
 - COMPLETION ring: kernel tells app TX is done for area (can be reused)

** Slide: Gotcha by RX-queue id binding                             :export:

AF_XDP bound to */single RX-queue id/* (for SPSC performance reasons)
- NIC by default spreads flows with RSS-hashing over RX-queues
  * Traffic likely not hitting queue you expect
- You *MUST* configure NIC *HW filters* to /steer to RX-queue id/
  * Out of scope for XDP setup
  * Use ethtool or TC HW offloading for filter setup
- *Alternative* work-around
  * /Create as many AF_XDP sockets as RXQs/
  * Have userspace poll()/select on all sockets

Powered by blists - more mailing lists