lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 12 Jul 2016 22:32:31 +0200
From:	Jesper Dangaard Brouer <brouer@...hat.com>
To:	John Fastabend <john.fastabend@...il.com>
Cc:	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	Jakub Kicinski <jakub.kicinski@...ronome.com>,
	"Fastabend, John R" <john.r.fastabend@...el.com>,
	"iovisor-dev@...ts.iovisor.org" <iovisor-dev@...ts.iovisor.org>,
	Brenden Blanco <bblanco@...mgrid.com>,
	Rana Shahout <ranas@...lanox.com>, Ari Saha <as754m@....com>,
	Tariq Toukan <tariqt@...lanox.com>,
	Or Gerlitz <ogerlitz@...lanox.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Simon Horman <horms@...ge.net.au>,
	Simon Horman <simon.horman@...ronome.com>,
	Edward Cree <ecree@...arflare.com>, brouer@...hat.com
Subject: Re: XDP seeking input from NIC hardware vendors

On Tue, 12 Jul 2016 12:13:01 -0700
John Fastabend <john.fastabend@...il.com> wrote:

> On 16-07-11 07:24 PM, Alexei Starovoitov wrote:
> > On Sat, Jul 09, 2016 at 01:27:26PM +0200, Jesper Dangaard Brouer wrote:  
> >> On Fri, 8 Jul 2016 18:51:07 +0100
> >> Jakub Kicinski <jakub.kicinski@...ronome.com> wrote:
> >>  
> >>> On Fri, 8 Jul 2016 09:45:25 -0700, John Fastabend wrote:  
> >>>> The only distinction between VFs and queue groupings on my side is VFs
> >>>> provide RSS where as queue groupings have to be selected explicitly.
> >>>> In a programmable NIC world the distinction might be lost if a "RSS"
> >>>> program can be loaded into the NIC to select queues but for existing
> >>>> hardware the distinction is there.    
> >>>
> >>> To do BPF RSS we need a way to select the queue which I think is all
> >>> Jesper wanted.  So we will have to tackle the queue selection at some
> >>> point.  The main obstacle with it for me is to define what queue
> >>> selection means when program is not offloaded to HW...  Implementing
> >>> queue selection on HW side is trivial.  
> >>
> >> Yes, I do see the problem of fallback, when the programs "filter" demux
> >> cannot be offloaded to hardware.
> >>
> >> First I though it was a good idea to keep the "demux-filter" part of
> >> the eBPF program, as software fallback can still apply this filter in
> >> SW, and just mark the packets as not-zero-copy-safe.  But when HW
> >> offloading is not possible, then packets can be delivered every RX
> >> queue, and SW would need to handle that, which hard to keep transparent.
> >>
> >>  
> >>>> If you demux using a eBPF program or via a filter model like
> >>>> flow_director or cls_{u32|flower} I think we can support both. And this
> >>>> just depends on the programmability of the hardware. Note flow_director
> >>>> and cls_{u32|flower} steering to VFs is already in place.    
> >>
> >> Maybe we should keep HW demuxing as a separate setup step.
> >>
> >> Today I can almost do what I want: by setting up ntuple filters, and (if
> >> Alexei allows it) assign an application specific XDP eBPF program to a
> >> specific RX queue.
> >>
> >>  ethtool -K eth2 ntuple on
> >>  ethtool -N eth2 flow-type udp4 dst-ip 192.168.254.1 dst-port 53 action 42
> >>
> >> Then the XDP program can be attached to RX queue 42, and
> >> promise/guarantee that it will consume all packet.  And then the
> >> backing page-pool can allow zero-copy RX (and enable scrubbing when
> >> refilling pool).  
> > 
> > so such ntuple rule will send udp4 traffic for specific ip and port
> > into a queue then it will somehow gets zero-copied to vm?
> > . looks like a lot of other pieces about zero-copy and qemu need to be
> > implemented (or at least architected) for this scheme to be conceivable
> > . and when all that happens what vm is going to do with this very specific
> > traffic? vm won't have any tcp or even ping?  
> 
> I have perhaps a different motivation to have queue steering in 'tc
> cls-u32' and eventually xdp. The general idea is I have thousands of
> queues and I can bind applications to the queues. When I know an
> application is bound to a queue I can enable per queue busy polling (to
> be implemented), set specific interrupt rates on the queue
> (implementation will be posted soon), bind the queue to the correct
> cpu, etc.

+1 

binding applications to queues.

This is basically what our customers are requesting. They have one or
two applications that need DPDK speeds.  But they don't like dedicating
an entire NIC per application (like DPDK requires).

The basic idea is actually more fundamental.  It reminds me of Van
Jacobson's netchannels[1] when he talks about "Channelize" (slides 24+)
Creating full "application" channel allow for lock free single producer
single consumer (SPSC) queue directly into the application.

[1] http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf


> ntuple works OK for this now but xdp provides more flexibility and
> also lets us add additional policy on the queue other than simply
> queue steering.
> 
> I'm not convinced though that the demux queue selection should be part
> of the XDP program itself just because it has no software analog to me
> it sits in front of the set of XDP programs. But I think I could perhaps
> be convinced it does if there is some reasonable way to do it. I guess
> the single program method would result in an XDP program that read like
> 
>   if (rx_queue == x)
>        do_foo
>   if (rx_queue == y)
>        do_bar

Yes, that is also why I wanted a XDP program per RX queue.  But the
"channelize" concept is more important.

 
> A hardware jit may be able to sort that out. Or use per queue
> sections.
> 
> > 
> > the network virtualization traffic is typically encapsulated,
> > so if xdp is used to do steer the traffic, the program would need
> > to figure out vm id based on headers, strip tunnel, apply policy
> > before forwarding the packet further. Clearly hw ntuple is not
> > going to suffice.
> >
> > If there is no networking virtualization and VMs are operating in
> > the flat network, then there is no policy, no ip filter, no vm
> > migration. Only mac per vm and sriov handles this case just fine.
> > When hw becomes more programmable we'll be able to load xdp program
> > into hw that does tunnel, policy and forwards into vf then sriov
> > will become actually usable for cloud providers.  
> 
> Yep :)
> 
> > hw xdp into vf is more interesting than into a queue, since there is
> > more than one queue/interrupt per vf and network heavy vm can
> > actually consume large amount of traffic.
> >   
> 
> Another use case I have is to make a really high performance AF_PACKET
> interface. So if there was a way to say bind a queue to an AF_PACKET
> ring and run a policy XDP program before hitting the AF_PACKET
> descriptor bit that would be really interesting because it would solve
> some of my need for poll mode drivers in userspace.

+1 yes, a super fast AF_PACKET is also on my wish/todo list for XDP.
It would basically allow for implementing DPDK or netmap on top of XDP
(as least the RX side) without needing to run a NIC driver in userspace.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ