netdev - Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJ8uoz2m2amnk_aFwpyGg8Ss20vbt5EeVbXLEXQeO+gfEBc_cg@mail.gmail.com>
Date:   Thu, 20 Sep 2018 11:17:17 +0200
From:   Magnus Karlsson <magnus.karlsson@...il.com>
To:     jakub.kicinski@...ronome.com
Cc:     ys114321@...il.com, "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Björn Töpel <bjorn.topel@...el.com>,
        ast@...nel.org, Daniel Borkmann <daniel@...earbox.net>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy
 and zero-copy on one queue id

On Wed, Sep 19, 2018 at 9:18 AM Magnus Karlsson
<magnus.karlsson@...il.com> wrote:
>
> On Wed, Sep 19, 2018 at 3:58 AM Jakub Kicinski
> <jakub.kicinski@...ronome.com> wrote:
> >
> > On Tue, 18 Sep 2018 10:22:11 -0700, Y Song wrote:
> > > > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > > > + * not know if the device has more tx queues than rx, or the opposite.
> > > > + * This might also change during run time.
> > > > + */
> > > > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> > > > +                               u16 queue_id)
> > > > +{
> > > > +       if (queue_id < dev->real_num_rx_queues)
> > > > +               dev->_rx[queue_id].umem = umem;
> > > > +       if (queue_id < dev->real_num_tx_queues)
> > > > +               dev->_tx[queue_id].umem = umem;
> > > > +}
> > > > +
> > > > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > > > +                                             u16 queue_id)
> > > > +{
> > > > +       if (queue_id < dev->real_num_rx_queues)
> > > > +               return dev->_rx[queue_id].umem;
> > > > +       if (queue_id < dev->real_num_tx_queues)
> > > > +               return dev->_tx[queue_id].umem;
> > > > +
> > > > +       return NULL;
> > > > +}
> > > > +
> > > > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > > > +{
> > > > +       /* Zero out the entry independent on how many queues are configured
> > > > +        * at this point in time, as it might be used in the future.
> > > > +        */
> > > > +       if (queue_id < dev->num_rx_queues)
> > > > +               dev->_rx[queue_id].umem = NULL;
> > > > +       if (queue_id < dev->num_tx_queues)
> > > > +               dev->_tx[queue_id].umem = NULL;
> > > > +}
> > > > +
> > >
> > > I am sure whether the following scenario can happen or not.
> > > Could you clarify?
> > >    1. suppose initially we have num_rx_queues = num_tx_queues = 10
> > >        xdp_reg_umem_at_qid() set umem1 to queue_id = 8
> > >    2. num_tx_queues is changed to 5
> > >    3. xdp_clear_umem_at_qid() is called for queue_id = 8,
> > >        and dev->_rx[8].umum = 0.
> > >    4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
> > >        dev->_rx[8].umem = umem2
> > >    5. num_tx_queues is changed to 10
> > >   Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
> > > a problem?
> >
> > Plus IIRC the check of qid vs real_num_[rt]x_queues in xsk_bind() is
> > not under rtnl_lock so it doesn't count for much.  Why not do all the
> > checks against num_[rt]x_queues here, instead of real_..?
>
> You are correct, two separate rtnl_lock regions is broken. Will spin a
> v2 tomorrow when I am back in the office.
>
> Thanks Jakub for catching this. I really appreciate you reviewing my code.

Sorry, forgot to answer your question about why real_num_ instead of
num_. If we used num_ instead, we would get a behavior where we can
open a socket on a queue id, let us say 10, that will not be active if
real_num_ is below 10. So no traffic will flow on this socket until we
issue an ethtool command to state that we have 10 or more queues
configured. While this behavior is sane (and consistent), I believe it
will lead to a more complicated driver implementation, break the
current uapi (since we will return an error if you try to bind to a
queue id that is not active at the moment) and I like my suggestion
below better :-).

What I would like to suggest is that the current model at bind() is
kept, that is it will fail if you try to bind to a non-active queue
id. But we add some code in ethool_set_channels in net/core/ethtool.c
to check if you have an active af_xdp socket on a queue id and try to
make it inactive, we return an error and disallow the change. This
would IMHO be like not allowing a load module to be unloaded if
someone is using it or not allowing unmounting a file system if files
are in use. What do you think? If you think this is the right way to
go, I can implement this in a follow up patch or in this patch set if
that makes more sense, let me know. This suggestion would actually
make it simpler to implement zero-copy support in the driver. Some of
the complications we are having today is due to the fact that ethtool
can come in from the side and change things for us when an AF_XDP
socket is active on a queue id. And implementing zero copy support in
the driver needs to get simpler IMO.

Please let me know what you think.

Thanks: Magnus

> /Magnus