lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJ8uoz0Q3GQ9T0_vhqB64LVBVJ6uR-ZMyT+3=yp+rrVDrOUfFA@mail.gmail.com>
Date:   Mon, 24 Sep 2018 10:37:24 +0200
From:   Magnus Karlsson <magnus.karlsson@...il.com>
To:     jakub.kicinski@...ronome.com
Cc:     Y Song <ys114321@...il.com>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Björn Töpel <bjorn.topel@...el.com>,
        ast@...nel.org, Daniel Borkmann <daniel@...earbox.net>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy
 and zero-copy on one queue id

On Thu, Sep 20, 2018 at 4:41 PM Jakub Kicinski
<jakub.kicinski@...ronome.com> wrote:
>
> On Thu, 20 Sep 2018 11:17:17 +0200, Magnus Karlsson wrote:
> > On Wed, Sep 19, 2018 at 9:18 AM Magnus Karlsson wrote:
> > > On Wed, Sep 19, 2018 at 3:58 AM Jakub Kicinski wrote:
> > > >
> > > > On Tue, 18 Sep 2018 10:22:11 -0700, Y Song wrote:
> > > > > > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > > > > > + * not know if the device has more tx queues than rx, or the opposite.
> > > > > > + * This might also change during run time.
> > > > > > + */
> > > > > > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
> > > > > > +                               u16 queue_id)
> > > > > > +{
> > > > > > +       if (queue_id < dev->real_num_rx_queues)
> > > > > > +               dev->_rx[queue_id].umem = umem;
> > > > > > +       if (queue_id < dev->real_num_tx_queues)
> > > > > > +               dev->_tx[queue_id].umem = umem;
> > > > > > +}
> > > > > > +
> > > > > > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > > > > > +                                             u16 queue_id)
> > > > > > +{
> > > > > > +       if (queue_id < dev->real_num_rx_queues)
> > > > > > +               return dev->_rx[queue_id].umem;
> > > > > > +       if (queue_id < dev->real_num_tx_queues)
> > > > > > +               return dev->_tx[queue_id].umem;
> > > > > > +
> > > > > > +       return NULL;
> > > > > > +}
> > > > > > +
> > > > > > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > > > > > +{
> > > > > > +       /* Zero out the entry independent on how many queues are configured
> > > > > > +        * at this point in time, as it might be used in the future.
> > > > > > +        */
> > > > > > +       if (queue_id < dev->num_rx_queues)
> > > > > > +               dev->_rx[queue_id].umem = NULL;
> > > > > > +       if (queue_id < dev->num_tx_queues)
> > > > > > +               dev->_tx[queue_id].umem = NULL;
> > > > > > +}
> > > > > > +
> > > > >
> > > > > I am sure whether the following scenario can happen or not.
> > > > > Could you clarify?
> > > > >    1. suppose initially we have num_rx_queues = num_tx_queues = 10
> > > > >        xdp_reg_umem_at_qid() set umem1 to queue_id = 8
> > > > >    2. num_tx_queues is changed to 5
> > > > >    3. xdp_clear_umem_at_qid() is called for queue_id = 8,
> > > > >        and dev->_rx[8].umum = 0.
> > > > >    4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
> > > > >        dev->_rx[8].umem = umem2
> > > > >    5. num_tx_queues is changed to 10
> > > > >   Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
> > > > > a problem?
> > > >
> > > > Plus IIRC the check of qid vs real_num_[rt]x_queues in xsk_bind() is
> > > > not under rtnl_lock so it doesn't count for much.  Why not do all the
> > > > checks against num_[rt]x_queues here, instead of real_..?
> > >
> > > You are correct, two separate rtnl_lock regions is broken. Will spin a
> > > v2 tomorrow when I am back in the office.
> > >
> > > Thanks Jakub for catching this. I really appreciate you reviewing my code.
> >
> > Sorry, forgot to answer your question about why real_num_ instead of
> > num_. If we used num_ instead, we would get a behavior where we can
> > open a socket on a queue id, let us say 10, that will not be active if
> > real_num_ is below 10. So no traffic will flow on this socket until we
> > issue an ethtool command to state that we have 10 or more queues
> > configured. While this behavior is sane (and consistent), I believe it
> > will lead to a more complicated driver implementation, break the
> > current uapi (since we will return an error if you try to bind to a
> > queue id that is not active at the moment) and I like my suggestion
> > below better :-).
> >
> > What I would like to suggest is that the current model at bind() is
> > kept, that is it will fail if you try to bind to a non-active queue
> > id. But we add some code in ethool_set_channels in net/core/ethtool.c
> > to check if you have an active af_xdp socket on a queue id and try to
> > make it inactive, we return an error and disallow the change. This
> > would IMHO be like not allowing a load module to be unloaded if
> > someone is using it or not allowing unmounting a file system if files
> > are in use. What do you think? If you think this is the right way to
> > go, I can implement this in a follow up patch or in this patch set if
> > that makes more sense, let me know. This suggestion would actually
> > make it simpler to implement zero-copy support in the driver. Some of
> > the complications we are having today is due to the fact that ethtool
> > can come in from the side and change things for us when an AF_XDP
> > socket is active on a queue id. And implementing zero copy support in
> > the driver needs to get simpler IMO.
> >
> > Please let me know what you think.
>
> I think I'd like that:
> https://patchwork.ozlabs.org/project/netdev/list/?series=57843&state=*
> in particular:
> https://patchwork.ozlabs.org/patch/949891/
> ;)

Perfect :-). I will drop the code I wrote to fix this and use yours in
my patch set. Please let me know if that does not work for you.

> If we fix the resize indeed you could stick to real_* but to be honest,
> I'm not entirely clear on what the advantage would be?  The arrays are
> sized to num..queues, do you envision a use case where having
> xdp_get_umem_from_qid() return a NULL when asked for qid > real_..
> would help?  Is there one for xdp_reg_umem_at_qid()

I just think that testing against real_ combined with not allowing to
remove queues that have an active af_xdp socket makes it consistent.
You cannot start an af_xdp socket on a non-active queue_id. You have
to make the queue active (increase real_num_ so it encompasses the
queue in question) first, then you can start your socket, and in order
to decrease real_num, you first have to remove the socket. AF_XDP
sockets are only on active queue ids. Should make the state handling
simpler, IMHO.

/Magnus

> I have no strong preference here, just wondering.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ