lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y5t5MZH1UwfLqhNC@C02F109XMD6R.local>
Date:   Thu, 15 Dec 2022 13:50:15 -0600
From:   Alex Forster <aforster@...udflare.com>
To:     Magnus Karlsson <magnus.karlsson@...il.com>
Cc:     Shawn Bohrer <sbohrer@...udflare.com>, netdev@...r.kernel.org,
        bpf@...r.kernel.org, bjorn@...nel.org, magnus.karlsson@...el.com,
        kernel-team@...udflare.com
Subject: Re: Possible race with xsk_flush

Hi Magnus,

> Could you please share how you set up the two AF_XDP sockets?

Our architecture is pretty unique:

   outside of │ inside of
    namespace │ namespace
              │
    ┌───────┐ │ ┌───────┐
    │ outer │ │ │ inner │
    │  veth │ │ │ veth  │
    └──┬─▲──┘ │ └──┬─▲──┘
       │ │    │    │ │
    ┌──▼─┴────┴────▼─┴──┐
    │    shared umem    │
    └───────────────────┘

The goal is to position ourselves in the middle of a veth pair so that
we can perform bidirectional traffic inspection and manipulation. To do
this, we attach AF_XDP to both veth interfaces and share a umem between
them. This allows us to forward packets between the veth interfaces
without copying in userspace.

These interfaces are both multi-queue, with AF_XDP sockets attached to
each queue. The queues are each managed on their own (unpinned) threads
and have their own rx/tx/fill/completion rings. We also enable
threaded NAPI on both of these interfaces, which may or may not be an
important detail to note, since the problem appears much harder (though
not impossible) to reproduce with threaded NAPI enabled.

Here’s a script that configures a namespace and veth pair that closely
resembles production, except for enabling threaded NAPI:

```
#!/bin/bash

set -e -u -x -o pipefail

QUEUES=${QUEUES:=$(($(grep -c ^processor /proc/cpuinfo)))}

OUTER_CUSTOMER_VETH=${OUTER_CUSTOMER_VETH:=outer-veth}
INNER_CUSTOMER_VETH=${INNER_CUSTOMER_VETH:=inner-veth}
CUSTOMER_NAMESPACE=${CUSTOMER_NAMESPACE:=customer-namespace}

ip netns add $CUSTOMER_NAMESPACE
ip netns exec $CUSTOMER_NAMESPACE bash <<EOF
  set -e -u -x -o pipefail
  ip addr add 127.0.0.1/8 dev lo
  ip link set dev lo up
EOF

ip link add \
  name $OUTER_CUSTOMER_VETH \
  numrxqueues $QUEUES numtxqueues $QUEUES type veth \
  peer name $INNER_CUSTOMER_VETH netns $CUSTOMER_NAMESPACE \
  numrxqueues $QUEUES numtxqueues $QUEUES

ethtool -K $OUTER_CUSTOMER_VETH \
  gro off gso off tso off tx off rxvlan off txvlan off
ip link set dev $OUTER_CUSTOMER_VETH up
ip addr add 169.254.10.1/30 dev $OUTER_CUSTOMER_VETH

ip netns exec $CUSTOMER_NAMESPACE bash <<EOF
  set -e -u -x -o pipefail
  ethtool -K $INNER_CUSTOMER_VETH \
    gro off gso off tso off tx off rxvlan off txvlan off
  ip link set dev $INNER_CUSTOMER_VETH up
  ip addr add 169.254.10.2/30 dev $INNER_CUSTOMER_VETH
EOF
```

> Are you using XDP_DRV mode in your tests?

Yes.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ