netdev - Re: [PATCH bpf v2] xdp: Fix spurious packet loss in generic XDP TX path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM1=_QTrTPaQn9fuYoOGV6vs-gjgztFyTieQKCCcY0pFuqvpKA@mail.gmail.com>
Date:   Sat, 2 Jul 2022 06:39:01 +0200
From:   Johan Almbladh <johan.almbladh@...finetworks.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>, song@...nel.org,
        martin.lau@...ux.dev, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, jolsa@...nel.org,
        Freysteinn.Alfredsson@....se, toke@...hat.com,
        bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf v2] xdp: Fix spurious packet loss in generic XDP TX path

On Sat, Jul 2, 2022 at 12:47 AM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> On 7/1/22 5:12 PM, Johan Almbladh wrote:
> > The byte queue limits (BQL) mechanism is intended to move queuing from
> > the driver to the network stack in order to reduce latency caused by
> > excessive queuing in hardware. However, when transmitting or redirecting
> > a packet using generic XDP, the qdisc layer is bypassed and there are no
> > additional queues. Since netif_xmit_stopped() also takes BQL limits into
> > account, but without having any alternative queuing, packets are
> > silently dropped.
> >
> > This patch modifies the drop condition to only consider cases when the
> > driver itself cannot accept any more packets. This is analogous to the
> > condition in __dev_direct_xmit(). Dropped packets are also counted on
> > the device.
> >
> > Bypassing the qdisc layer in the generic XDP TX path means that XDP
> > packets are able to starve other packets going through a qdisc, and
> > DDOS attacks will be more effective. In-driver-XDP use dedicated TX
> > queues, so they do not have this starvation issue.
> >
> > Signed-off-by: Johan Almbladh <johan.almbladh@...finetworks.com>
> > ---
> >   net/core/dev.c | 9 +++++++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 8e6f22961206..00fb9249357f 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4863,7 +4863,10 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> >   }
> >
> >   /* When doing generic XDP we have to bypass the qdisc layer and the
> > - * network taps in order to match in-driver-XDP behavior.
> > + * network taps in order to match in-driver-XDP behavior. This also means
> > + * that XDP packets are able to starve other packets going through a qdisc,
> > + * and DDOS attacks will be more effective. In-driver-XDP use dedicated TX
> > + * queues, so they do not have this starvation issue.
> >    */
> >   void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
> >   {
> > @@ -4875,10 +4878,12 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
> >       txq = netdev_core_pick_tx(dev, skb, NULL);
> >       cpu = smp_processor_id();
> >       HARD_TX_LOCK(dev, txq, cpu);
> > -     if (!netif_xmit_stopped(txq)) {
> > +     if (!netif_xmit_frozen_or_drv_stopped(txq)) {
> >               rc = netdev_start_xmit(skb, dev, txq, 0);
> >               if (dev_xmit_complete(rc))
> >                       free_skb = false;
> > +     } else {
> > +             dev_core_stats_tx_dropped_inc(dev);
> >       }
> >       HARD_TX_UNLOCK(dev, txq);
> >       if (free_skb) {
>
> Small q: Shouldn't the drop counter go into the free_skb branch?

This was on purpose to not increment the counter twice, but I think
you are right. The driver update the tx_dropped counter if the packet
is dropped, but I see that it also consumes the skb in those cases.
Looking again at the driver tree I cannot found any examples where the
driver updates the counter *without* consuming the skb. This logic
makes sense - whoever consumes the skb it is also responsible for
updating the counters on the netdev.

>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 00fb9249357f..17e2c39477c5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4882,11 +4882,10 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
>                  rc = netdev_start_xmit(skb, dev, txq, 0);
>                  if (dev_xmit_complete(rc))
>                          free_skb = false;
> -       } else {
> -               dev_core_stats_tx_dropped_inc(dev);
>          }
>          HARD_TX_UNLOCK(dev, txq);
>          if (free_skb) {
> +               dev_core_stats_tx_dropped_inc(dev);
>                  trace_xdp_exception(dev, xdp_prog, XDP_TX);
>                  kfree_skb(skb);
>          }