lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJ+HfNjBOt6f17vGHxvi5rUMWt3R98+B2jgLOf=0_uvTyNnUTw@mail.gmail.com>
Date:   Tue, 31 Jul 2018 09:15:42 +0200
From:   Björn Töpel <bjorn.topel@...il.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Björn Töpel <bjorn.topel@...el.com>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        oss-drivers@...ronome.com, Netdev <netdev@...r.kernel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Francois Ozog <francois.ozog@...aro.org>,
        MykytaI Iziumtsev <mykyta.iziumtsev@...aro.org>
Subject: Re: [RFC bpf-next 0/6] net: xsk: minor improvements around queue handling

Den tis 31 juli 2018 kl 04:49 skrev Jakub Kicinski
<jakub.kicinski@...ronome.com>:
>
> On Mon, 30 Jul 2018 14:49:32 +0200, Björn Töpel wrote:
> > Den tors 26 juli 2018 kl 23:44 skrev Jakub Kicinski:
> > >
> > > Hi!
> >
> > Thanks for spending your time on this, Jakub. I'm (temporarily) back
> > for a week, so you can expect faster replies now...
> >
> > > This set tries to make the core take care of error checking for the
> > > drivers.  In particular making sure that the AF_XDP UMEM is not installed
> > > on queues which don't exist (or are disabled) and that changing queue
> > > (AKA ethtool channel) count cannot disable queues with active AF_XDF
> > > zero-copy sockets.
> > >
> > > I'm sending as an RFC because I'm not entirely sure what the desired
> > > behaviour is here.  Is it Okay to install AF_XDP on queues which don't
> > > exist?  I presume not?
> >
> > Your presumption is correct. The idea with the
> > real_num_rx_queues/real_num_tx_queues check in xsk_bind, is to bail
> > out if the queue doesn't exist at bind call. Note that we *didn't* add
> > any code to avoid the bound queue from being removed via set channel
> > (your patch 6). Our idea was that if you remove a queue, the ingress
> > frames would simply stop flowing, and the queue config change checking
> > was "out-of-band".
> >
> > I think I prefer your approach, i.e. not allowing the channels/queues
> > to change if they're bound to an AF_XDP socket. However, your
> > xdp_umem_query used in ethtool only works for ZC enabled drivers, not
> > for the existing non-ZC/copy case. If we'd like to go the route of
> > disabling ethtool_set_channels for an AF_XDP enabled queue this
> > functionality needs to move the query into netdev core, so we have a
> > consistent behavior.
>
> Agreed.  There seems to be no notification for changing the number of
> queues and therefore no very clean way to solve this today.

I'm probably lacking some history here; Has there been any past
efforts in making channels/queues a "kernel object"? Would it make
sense to add notifications for queue changes analogous to netdev
changes?

> The last
> two patches are more of a courtesy to the drivers, to simplify the
> data structure for holding the UMEMs.
>
> I could argue that driver and stack are not really apples to apples.
> Much like Generic XDP, the skb-based AF_XDP is basically a development
> tool and last-resort fallback.

Hmm... I partially agree. Let me think about it a bit more.

> For TX driver will most likely allocate
> separate queues, while skb-based will use the stack's queues.  These
> are actually different queues.  Stack will also fallback to other queue
> in __netdev_pick_tx() if number of queues changes.
>

Yup, ideally the driver will use a dedicated queue. We had some
thoughts on hijacking the skb Tx queue, and route the stack egress
packets elsewhere, but it ended up way too messy.

> But yes, preferably skb-based and ZC should behave the same..
>
> > > Are the AF_XDP queue_ids referring to TX queues
> > > as well as RX queues in case of the driver?  I presume not?
> >
> > We've had a lot of discussions about this internally. Ideally, we'd
> > like to give driver implementors the most freedom, and not enforcing a
> > certain queue scheme for Tx.
>
> You say freedom I hear diverging implementations and per-driver
> checks in user space ;-)
>

Yeah. :-) Well, for the i40e ZC implementation, the Tx queue id was
not equal to Rx queue id, so to answer your question: "Correct, the
queue id refer to Rx."

> Practically speaking unless you take the xmit lock there is little
> chance of reusing stack's TX queues, so you'd have to allocate a
> separate queue one way or the other..  At which point the number of
> stack's TX queues has no bearing on AF_XDP ZC.
>

Yup, you're right.

> > OTOH it makes it weird for the userland
> > application *not* to have the same id, e.g. if a userland application
> > would like to get stats or configure the AF_XDP bound Tx queue --
> > which id is it? Should the Tx queue id  for an xsk be exposed in
> > sysfs?
>
> I'd not go there.
>

Honestly, me neither. I need to think more about to expose the Tx
queue pulls/knobs for a control plane.

> > If the id is *not* the same, would it be OK to change the number of
> > channels and Tx would continue to operate correctly? A related
> > question; An xsk with *only* Tx, should it be constrained by the
> > number of (enabled) Rx queues?
>
> Good question, are drivers even supposed to care about tx-only/rx-only?
> From driver's perspective rx-only socket will simply never have
> anything to transmit, and tx-only socket will never successfully
> xdp_do_redirect().  So IMHO - yes, tx-only XSK still only cares about
> RX queues, driver doesn't know no RX will ever happen.  The fact that
> user has to populate the FILL queue on a tx-only socket may be counter
> intuitive to many...
>

Not allowing tx-only/rx-only definitely makes it easier. Many NICs,
however, have non-symmetrical # of Tx/Rx queues. Scenarios where one
would have #txq >> #rxq would then be hard to support. Again this
could be worked around with virtual netdevs, so maybe it makes sense
to keep the socket layer simple.

> I was considering proposing a change to drop the *x-only option in the
> net tree, I don't really see much use for it :S  And it potentially
> creates weird corner cases.
>
> > I'd be happy to hear some more opinions/thoughts on this...
>
> +1
>
> > > Should we try to prevent disabling queues which have non zero-copy
> > > sockets installed as well? :S
> > >
> >
> > Yes, the ZC/non-ZC case must be consistent IMO. See comment above.
> >
> > > Anyway, if any of those patches seem useful and reasonable, please
> > > let me know I will repost as non-RFC.
> > >
> >
> > I definitely think patch 2 and 3 (and probably 1) should go as
> > non-RFC!
>
> Thanks Björn, I will submit the first three!
>

Thanks! I've already taken them for a spin and acked them!


Björn

> > Thanks for spotting that we're not holding the rtnl lock when checking
> > the # queues (patch 4)!
> >
> > Björn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ