lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181002095835.6cd49d57@cakuba.netronome.com>
Date:   Tue, 2 Oct 2018 09:58:35 -0700
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     Magnus Karlsson <magnus.karlsson@...il.com>
Cc:     "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Björn Töpel 
        <bjorn.topel@...el.com>, ast@...nel.org,
        Daniel Borkmann <daniel@...earbox.net>,
        Network Development <netdev@...r.kernel.org>,
        Jesper Dangaard Brouer <brouer@...hat.com>
Subject: Re: [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both
 copy and zero-copy mode

On Tue, 2 Oct 2018 14:49:13 +0200, Magnus Karlsson wrote:
> On Mon, Oct 1, 2018 at 10:34 PM Jakub Kicinski wrote:
> > On Mon,  1 Oct 2018 14:51:32 +0200, Magnus Karlsson wrote:  
> > > Jakub, please take a look at your patches. The last one I had to
> > > change slightly to make it fit with the new interface
> > > xdp_get_umem_from_qid(). An added bonus with this function is that we,
> > > in the future, can also use it from the driver to get a umem, thus
> > > simplifying driver implementations (and later remove the umem from the
> > > NDO completely). Björn will mail patches, at a later point in time,
> > > using this in the i40e and ixgbe drivers, that removes a good chunk of
> > > code from the ZC implementations.  
> >
> > Nice, drivers which don't follow the prepare/commit model of handling
> > reconfigurations will benefit!
> >  
> > > I also made your code aware of Tx queues. If we create a socket that
> > > only has a Tx queue, then the queue id will refer to a Tx queue id
> > > only and could be larger than the available amount of Rx queues.
> > > Please take a look at it.  
> >
> > The semantics of Tx queue id are slightly unclear.  To me XDP is
> > associated with Rx, so the qid in driver context can only refer to
> > Rx queue and its associated XDP Tx queue.  It does not mean the Tx
> > queue stack uses, like it does for copy fallback.  If one doesn't have
> > a Rx queue $id, there will be no associated XDP Tx queue $id (in all
> > drivers but Intel, and virtio, which use per-CPU Tx queues making TX
> > queue even more meaningless).
> >
> > Its to be seen how others implement AF_XDP.  My general feeling is
> > that we should only talk about Rx queues in context of driver XDP.  
> 
> This is the way I see it. From an uapi point of view we can create a
> socket that can only do Rx, only Tx or both. We then bind this socket
> to a specific queue id on a device. If a packet is received on this
> queue id it is sent (by the default xdpsock sample program) to the
> socket. If a packet is sent on this socket it goes out on this same
> queue id. If you have not registered an Rx ring (in user space) for
> this socket, you cannot receive anything on this socket. And
> conversely, if you have no Tx ring, you will not be able to send
> anything.
> 
> But if we take a look at this from the driver perspective and the NDO
> XDP_SETUP_XSK_UMEM, today it does not know anything about if Rx and Tx
> rings have been setup in the socket. It will always initialize the HW
> Rx and Tx queues of the supplied queue id. So with today's NDO
> interface you will always get a Rx/Tx queue pair. In order to realize
> the uapi above in an efficient manner and to support devices with more
> Tx queues than Rx, we need to change the NDO.
> 
> Just as a note, in the applications I am used to work on, radio base
> stations and other telecom apps, it is the common case to have many
> more Tx queues than Rx queues just to be able to use scheduling,
> shaping and other QoS features that are important on egress in those
> systems. Therefore the interest in supporting Tx only queues. But
> maybe this is just a weird case, do not know.

It's a good case, it should be supported.  I'm just wondering whether
the API we have today is going to be the right one.  So for i40e you
actually allocate TX ring per RX ring?  In ixgbe IIUC there is an XDP
TX ring per core so regardless how many TX queues one requests there
will actually be num_cpu_ids XDP TX queues... so even the check against
RX isn't meaningful there.  Hm..  Okay, I think what you've done is the
safest bet, we can always relax the check later on.

LGTM, sorry for the noise! :)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ