lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJ8uoz2nYwGROzRwevT2X5U-XyGePrbJyM63iE1QZL=V-Y4pUg@mail.gmail.com>
Date:   Tue, 15 Sep 2020 19:46:35 +0200
From:   Magnus Karlsson <magnus.karlsson@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Björn Töpel <bjorn.topel@...el.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Network Development <netdev@...r.kernel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        A.Zema@...convsystems.com
Subject: Re: [PATCH bpf v4] xsk: do not discard packet when NETDEV_TX_BUSY

On Tue, Sep 15, 2020 at 5:49 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> Hey Magnus,
>
> On 9/11/20 2:43 PM, Magnus Karlsson wrote:
> > From: Magnus Karlsson <magnus.karlsson@...el.com>
> >
> > In the skb Tx path, transmission of a packet is performed with
> > dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it
> > signifies that it was not possible to send the packet right now,
> > please try later. Unfortunately, the xsk transmit code discarded the
> > packet and returned EBUSY to the application. Fix this unnecessary
> > packet loss, by not discarding the packet in the Tx ring and return
> > EAGAIN. As EAGAIN is returned to the application, it can then retry
> > the send operation later and the packet will then likely be sent as
> > the driver will then likely have space/resources to send the packet.
> >
> > In summary, EAGAIN tells the application that the packet was not
> > discarded from the Tx ring and that it needs to call send()
> > again. EBUSY, on the other hand, signifies that the packet was not
> > sent and discarded from the Tx ring. The application needs to put the
> > packet on the Tx ring again if it wants it to be sent.
> >
> > Fixes: 35fcde7f8deb ("xsk: support for Tx")
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com>
> > Reported-by: Arkadiusz Zema <A.Zema@...convsystems.com>
> > Suggested-by: Arkadiusz Zema <A.Zema@...convsystems.com>
> > Suggested-by: Daniel Borkmann <daniel@...earbox.net>
> > ---
> > v3->v4:
> > * Free the skb without triggering the drop trace when NETDEV_TX_BUSY
> > * Call consume_skb instead of kfree_skb when the packet has been
> >    sent successfully for correct tracing
> > * Use sock_wfree as destructor when NETDEV_TX_BUSY
> > v1->v3:
> > * Hinder dev_direct_xmit() from freeing and completing the packet to
> >    user space by manipulating the skb->users count as suggested by
> >    Daniel Borkmann.
> > ---
> >   net/xdp/xsk.c | 17 ++++++++++++++++-
> >   1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index c323162..d32e39d 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -377,15 +377,30 @@ static int xsk_generic_xmit(struct sock *sk)
> >               skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
> >               skb->destructor = xsk_destruct_skb;
> >
> > +             /* Hinder dev_direct_xmit from freeing the packet and
> > +              * therefore completing it in the destructor
> > +              */
> > +             refcount_inc(&skb->users);
> >               err = dev_direct_xmit(skb, xs->queue_id);
> > +             if  (err == NETDEV_TX_BUSY) {
> > +                     /* Tell user-space to retry the send */
> > +                     skb->destructor = sock_wfree;
>
> I see, good catch, you need this one here as otherwise you leak wmem accounting
> given it's also part of xsk_destruct_skb() and we do free the prior allocated skb
> in this case.
>
> > +                     /* Free skb without triggering the perf drop trace */
> > +                     __kfree_skb(skb);
>
> As a minor nit, I would just use consume_skb(skb) here given this doesn't blindly
> ignore the skb_unref(). It's mostly about seeing where drops are happening so that
> tracepoint is set to kfree_skb() which is the more interesting one. Other than that
> looks good and ready to go. Thanks (& sorry for late reply)!

Thank you for reviewing this. I will spin a v5.

> > +                     err = -EAGAIN;
> > +                     goto out;
> > +             }
> > +
> >               xskq_cons_release(xs->tx);
> >               /* Ignore NET_XMIT_CN as packet might have been sent */
> > -             if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) {
> > +             if (err == NET_XMIT_DROP) {
> >                       /* SKB completed but not sent */
> > +                     kfree_skb(skb);
> >                       err = -EBUSY;
> >                       goto out;
> >               }
> >
> > +             consume_skb(skb);
> >               sent_frame = true;
> >       }
> >
> >
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ