[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH3MdRXk6CDDRPSSao9coBh3V34tFYb6sCUFVosN5bZ4==GExA@mail.gmail.com>
Date: Wed, 19 Sep 2018 21:47:38 -0700
From: Y Song <ys114321@...il.com>
To: Magnus Karlsson <magnus.karlsson@...il.com>
Cc: magnus.karlsson@...el.com,
Björn Töpel <bjorn.topel@...el.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy
and zero-copy on one queue id
On Wed, Sep 19, 2018 at 12:15 AM Magnus Karlsson
<magnus.karlsson@...il.com> wrote:
>
> On Tue, Sep 18, 2018 at 7:23 PM Y Song <ys114321@...il.com> wrote:
> >
> > On Tue, Sep 18, 2018 at 3:13 AM Magnus Karlsson
> > <magnus.karlsson@...el.com> wrote:
> > >
> > > Previously, the xsk code did not record which umem was bound to a
> > > specific queue id. This was not required if all drivers were zero-copy
> > > enabled as this had to be recorded in the driver anyway. So if a user
> > > tried to bind two umems to the same queue, the driver would say
> > > no. But if copy-mode was first enabled and then zero-copy mode (or the
> > > reverse order), we mistakenly enabled both of them on the same umem
> > > leading to buggy behavior. The main culprit for this is that we did
> > > not store the association of umem to queue id in the copy case and
> > > only relied on the driver reporting this. As this relation was not
> > > stored in the driver for copy mode (it does not rely on the AF_XDP
> > > NDOs), this obviously could not work.
> > >
> > > This patch fixes the problem by always recording the umem to queue id
> > > relationship in the netdev_queue and netdev_rx_queue structs. This way
> > > we always know what kind of umem has been bound to a queue id and can
> > > act appropriately at bind time.
> > >
> > > Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com>
> > > ---
> > > net/xdp/xdp_umem.c | 87 +++++++++++++++++++++++++++++++++++++++++++-----------
> > > net/xdp/xdp_umem.h | 2 +-
> > > 2 files changed, 71 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> > > index b3b632c..12300b5 100644
> > > --- a/net/xdp/xdp_umem.c
> > > +++ b/net/xdp/xdp_umem.c
> > > @@ -42,6 +42,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
> > > }
> > > }
> > >
> > > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > > + * not know if the device has more tx queues than rx, or the opposite.
> > > + * This might also change during run time.
> > > + */
> > > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> > > + u16 queue_id)
> > > +{
> > > + if (queue_id < dev->real_num_rx_queues)
> > > + dev->_rx[queue_id].umem = umem;
> > > + if (queue_id < dev->real_num_tx_queues)
> > > + dev->_tx[queue_id].umem = umem;
> > > +}
> > > +
> > > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > > + u16 queue_id)
> > > +{
> > > + if (queue_id < dev->real_num_rx_queues)
> > > + return dev->_rx[queue_id].umem;
> > > + if (queue_id < dev->real_num_tx_queues)
> > > + return dev->_tx[queue_id].umem;
> > > +
> > > + return NULL;
> > > +}
> > > +
> > > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > > +{
> > > + /* Zero out the entry independent on how many queues are configured
> > > + * at this point in time, as it might be used in the future.
> > > + */
> > > + if (queue_id < dev->num_rx_queues)
> > > + dev->_rx[queue_id].umem = NULL;
> > > + if (queue_id < dev->num_tx_queues)
> > > + dev->_tx[queue_id].umem = NULL;
> > > +}
> > > +
> >
> > I am sure whether the following scenario can happen or not.
> > Could you clarify?
> > 1. suppose initially we have num_rx_queues = num_tx_queues = 10
> > xdp_reg_umem_at_qid() set umem1 to queue_id = 8
> > 2. num_tx_queues is changed to 5
>
> You probably mean real_num_tx_queues here. This is the current number
> of queues configured via e.g. ethtool. num_tx_queues will not change
> unless you change device (or device driver).
I am actually not really familiar with this area of
code. Your explanation definitely helps my understanding.
>
> > 3. xdp_clear_umem_at_qid() is called for queue_id = 8,
> > and dev->_rx[8].umum = 0.
>
> At this point both _rx[8].umem and _tx[8].umem are set to NULL as the
> test is against num_[rx|tx]_queues which is the max allowed for the
> device, not the current allocated one which is real_num_tx_queues.
> With this in mind, the scenario below will not happen. Do you agree?
I agree. If num_{rx,tx}_queues are not changed, than clear should set
both with NULL and the step 4/5 won't happen any more.
>
> > 4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
> > dev->_rx[8].umem = umem2
> > 5. num_tx_queues is changed to 10
> > Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
> > a problem?
> >
> > > int xdp_umem_query(struct net_device *dev, u16 queue_id)
> > > {
> > > struct netdev_bpf bpf;
> > > @@ -58,11 +93,11 @@ int xdp_umem_query(struct net_device *dev, u16 queue_id)
> > > }
> > >
> > > int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > > - u32 queue_id, u16 flags)
> > > + u16 queue_id, u16 flags)
> > > {
> > > bool force_zc, force_copy;
> > > struct netdev_bpf bpf;
> > > - int err;
> > > + int err = 0;
> > >
> > > force_zc = flags & XDP_ZEROCOPY;
> > > force_copy = flags & XDP_COPY;
> > > @@ -70,18 +105,28 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > > if (force_zc && force_copy)
> > > return -EINVAL;
> > >
> > > + rtnl_lock();
> > > + if (xdp_get_umem_from_qid(dev, queue_id)) {
> > > + err = -EBUSY;
> > > + goto rtnl_unlock;
> > > + }
> > > +
> > > + xdp_reg_umem_at_qid(dev, umem, queue_id);
> > > + umem->dev = dev;
> > > + umem->queue_id = queue_id;
> > > if (force_copy)
> > > - return 0;
> > > + /* For copy-mode, we are done. */
> > > + goto rtnl_unlock;
> > >
> > > - if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
> > > - return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
> > > + if (!dev->netdev_ops->ndo_bpf ||
> > > + !dev->netdev_ops->ndo_xsk_async_xmit) {
> > > + err = -EOPNOTSUPP;
> > > + goto err_unreg_umem;
> > > + }
> > >
> > > - rtnl_lock();
> > > err = xdp_umem_query(dev, queue_id);
> > > - if (err) {
> > > - err = err < 0 ? -EOPNOTSUPP : -EBUSY;
> > > - goto err_rtnl_unlock;
> > > - }
> > > + if (err)
> > > + goto err_unreg_umem;
> > >
> > > bpf.command = XDP_SETUP_XSK_UMEM;
> > > bpf.xsk.umem = umem;
> > > @@ -89,18 +134,20 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > >
> > > err = dev->netdev_ops->ndo_bpf(dev, &bpf);
> > > if (err)
> > > - goto err_rtnl_unlock;
> > > + goto err_unreg_umem;
> > > rtnl_unlock();
> > >
> > > dev_hold(dev);
> > > - umem->dev = dev;
> > > - umem->queue_id = queue_id;
> > > umem->zc = true;
> > > return 0;
> > >
> > > -err_rtnl_unlock:
> > > +err_unreg_umem:
> > > + xdp_clear_umem_at_qid(dev, queue_id);
> >
> > You did not clear umem->dev and umem->queue_id,is a problem here?
> > For example in xdp_umem_clear_dev(), umem->dev is checked.
>
> As the umem might be shared, I cannot clear these variables as they
> are used by another socket when it is shared.
>
> The umem->dev check in xdp_umem_clear_dev() is for sockets that are
> killed when only half-ways set up. In that case, umem->dev might still
> be NULL.
Make sense.
>
> > > + if (!force_zc)
> > > + err = 0; /* fallback to copy mode */
> > > +rtnl_unlock:
> > > rtnl_unlock();
> > > - return force_zc ? err : 0; /* fail or fallback */
> > > + return err;
> > > }
> > >
> > > static void xdp_umem_clear_dev(struct xdp_umem *umem)
> > > @@ -108,7 +155,7 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
> > > struct netdev_bpf bpf;
> > > int err;
> > >
> > > - if (umem->dev) {
> > > + if (umem->zc) {
> > > bpf.command = XDP_SETUP_XSK_UMEM;
> > > bpf.xsk.umem = NULL;
> > > bpf.xsk.queue_id = umem->queue_id;
> > > @@ -121,7 +168,13 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
> > > WARN(1, "failed to disable umem!\n");
> > >
> > > dev_put(umem->dev);
> > > - umem->dev = NULL;
> > > + umem->zc = false;
> > > + }
> > > +
> > > + if (umem->dev) {
> > > + rtnl_lock();
> > > + xdp_clear_umem_at_qid(umem->dev, umem->queue_id);
> > > + rtnl_unlock();
> >
> > Previously, umem->dev is reset to NULL. Now, it is left as is. I
> > assume it is not possible
> > that this function xdp_umem_clear_dev() is called again for this umem, right?
>
> This function will only be called once for each umem / queue_id pair.
> We only allow one such binding with this patch. This is what was
> broken with the original code.
Thanks for explanation.
>
> Thanks Song for your review. I appreciate that you take a look at my code.
>
> /Magnus
>
> > > }
> > > }
> > >
> > > diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
> > > index c8be1ad..2760322 100644
> > > --- a/net/xdp/xdp_umem.h
> > > +++ b/net/xdp/xdp_umem.h
> > > @@ -9,7 +9,7 @@
> > > #include <net/xdp_sock.h>
> > >
> > > int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> > > - u32 queue_id, u16 flags);
> > > + u16 queue_id, u16 flags);
> > > bool xdp_umem_validate_queues(struct xdp_umem *umem);
> > > void xdp_get_umem(struct xdp_umem *umem);
> > > void xdp_put_umem(struct xdp_umem *umem);
> > > --
> > > 2.7.4
> > >
Powered by blists - more mailing lists