netdev - Re: [PATCH bpf-next 13/15] xsk: support for Tx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-JxQsJuJMh4=3An=oE0+R6FJ7f7CnUmQP41EOjEMc7VmQ@mail.gmail.com>
Date:   Tue, 24 Apr 2018 12:57:38 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Björn Töpel <bjorn.topel@...il.com>
Cc:     "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Alexander Duyck <alexander.h.duyck@...el.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        Alexei Starovoitov <ast@...com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Network Development <netdev@...r.kernel.org>,
        michael.lundkvist@...csson.com,
        "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
        "Singhai, Anjali" <anjali.singhai@...el.com>,
        "Zhang, Qi Z" <qi.z.zhang@...el.com>
Subject: Re: [PATCH bpf-next 13/15] xsk: support for Tx

On Mon, Apr 23, 2018 at 9:56 AM, Björn Töpel <bjorn.topel@...il.com> wrote:
> From: Magnus Karlsson <magnus.karlsson@...el.com>
>
> Here, Tx support is added. The user fills the Tx queue with frames to
> be sent by the kernel, and let's the kernel know using the sendmsg
> syscall.
>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com>

> +static int xsk_xmit_skb(struct sk_buff *skb)

This is basically packet_direct_xmit. Might be better to just move that
to net/core/dev.c and use in both AF_PACKET and AF_XDP.

Also, (eventually) AF_XDP may also want to support the regular path
through dev_queue_xmit to go through traffic shaping.

> +{
> +       struct net_device *dev = skb->dev;
> +       struct sk_buff *orig_skb = skb;
> +       struct netdev_queue *txq;
> +       int ret = NETDEV_TX_BUSY;
> +       bool again = false;
> +
> +       if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)))
> +               goto drop;
> +
> +       skb = validate_xmit_skb_list(skb, dev, &again);
> +       if (skb != orig_skb)
> +               return NET_XMIT_DROP;

Need to free generated segment list on error, see packet_direct_xmit.

> +
> +       txq = skb_get_tx_queue(dev, skb);
> +
> +       local_bh_disable();
> +
> +       HARD_TX_LOCK(dev, txq, smp_processor_id());
> +       if (!netif_xmit_frozen_or_drv_stopped(txq))
> +               ret = netdev_start_xmit(skb, dev, txq, false);
> +       HARD_TX_UNLOCK(dev, txq);
> +
> +       local_bh_enable();
> +
> +       if (!dev_xmit_complete(ret))
> +               goto out_err;
> +
> +       return ret;
> +drop:
> +       atomic_long_inc(&dev->tx_dropped);
> +out_err:
> +       return NET_XMIT_DROP;
> +}

> +static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
> +                           size_t total_len)
> +{
> +       bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
> +       u32 max_batch = TX_BATCH_SIZE;
> +       struct xdp_sock *xs = xdp_sk(sk);
> +       bool sent_frame = false;
> +       struct xdp_desc desc;
> +       struct sk_buff *skb;
> +       int err = 0;
> +
> +       if (unlikely(!xs->tx))
> +               return -ENOBUFS;
> +       if (need_wait)
> +               return -EOPNOTSUPP;
> +
> +       mutex_lock(&xs->mutex);
> +
> +       while (xskq_peek_desc(xs->tx, &desc)) {

It is possible to pass a chain of skbs to validate_xmit_skb_list and
eventually pass this chain to xsk_xmit_skb, amortizing the cost of
taking the txq lock. Fine to ignore for this patch set.

> +               char *buffer;
> +               u32 id, len;
> +
> +               if (max_batch-- == 0) {
> +                       err = -EAGAIN;
> +                       goto out;
> +               }
> +
> +               if (xskq_reserve_id(xs->umem->cq)) {
> +                       err = -EAGAIN;
> +                       goto out;
> +               }
> +
> +               len = desc.len;
> +               if (unlikely(len > xs->dev->mtu)) {
> +                       err = -EMSGSIZE;
> +                       goto out;
> +               }
> +
> +               skb = sock_alloc_send_skb(sk, len, !need_wait, &err);
> +               if (unlikely(!skb)) {
> +                       err = -EAGAIN;
> +                       goto out;
> +               }
> +
> +               skb_put(skb, len);
> +               id = desc.idx;
> +               buffer = xdp_umem_get_data(xs->umem, id) + desc.offset;
> +               err = skb_store_bits(skb, 0, buffer, len);
> +               if (unlikely(err))
> +                       goto out_store;

As xsk_destruct_skb delays notification until consume_skb is called, this
copy can be avoided by linking the xdp buffer into the skb frags array,
analogous to tpacket_snd.

You probably don't care much about the copy slow path, and this can be
implemented later, so also no need to do in this patchset.

static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
+                                             struct xdp_desc *desc)
+{
+       struct xdp_rxtx_ring *ring;
+
+       if (q->cons_tail == q->cons_head) {
+               WRITE_ONCE(q->ring->consumer, q->cons_tail);
+               q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
+
+               /* Order consumer and data */
+               smp_rmb();
+
+               return xskq_validate_desc(q, desc);
+       }
+
+       ring = (struct xdp_rxtx_ring *)q->ring;
+       *desc = ring->desc[q->cons_tail & q->ring_mask];
+       return desc;

This only validates descriptors if taking the branch.