netdev - Re: [PATCH v8 04/11] net/mlx4_en: add support for fast rx drop bpf program

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160715165651.GB3693@ast-mbp.thefacebook.com>
Date:	Fri, 15 Jul 2016 09:56:53 -0700
From:	Alexei Starovoitov <alexei.starovoitov@...il.com>
To:	Jesper Dangaard Brouer <brouer@...hat.com>
Cc:	Brenden Blanco <bblanco@...mgrid.com>, davem@...emloft.net,
	netdev@...r.kernel.org, Jamal Hadi Salim <jhs@...atatu.com>,
	Saeed Mahameed <saeedm@....mellanox.co.il>,
	Martin KaFai Lau <kafai@...com>, Ari Saha <as754m@....com>,
	Or Gerlitz <gerlitz.or@...il.com>, john.fastabend@...il.com,
	hannes@...essinduktion.org, Thomas Graf <tgraf@...g.ch>,
	Tom Herbert <tom@...bertland.com>,
	Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH v8 04/11] net/mlx4_en: add support for fast rx drop bpf
 program

On Fri, Jul 15, 2016 at 10:21:23AM +0200, Jesper Dangaard Brouer wrote:
> > 
> > attaching program to all rings at once is a fundamental part for correct
> > operation. As was pointed out in the past the bpf_prog pointer
> > in the ring design loses atomicity of the update. While the new program is
> > being attached the old program is still running on other rings.
> > That is not something user space can compensate for.
> 
> I don't see a problem with this.  This is how iptables have been
> working for years.  The iptables ruleset exchange is atomic, but only
> atomic per CPU.  It's been working fine for iptables.

And how is that a good thing?

> > So for current 'one prog for all rings' we cannot do what you're suggesting,
> > yet it doesn't mean we won't do prog per ring tomorrow. To do that the other
> > aspects need to be agreed upon before we jump into implementation:
> > - what is the way for the program to know which ring it's running on?
> >   if there is no such way, then attaching the same prog to multiple
> >   ring is meaningless.
> >   we can easily extend 'struct xdp_md' in the future if we decide
> >   that it's worth doing.
> 
> Not sure we need to extend 'struct xdp_md' with a ring number. If we
> allow assigning the program to a specific queue (if not we might need to).
> 
> The setup sequence would be:
> 1. userspace setup ntuple filter into a queue number
> 2. userspace register XDP program on this queue number
> 3. kernel XDP program queue packets into SPSC queue (like netmap)
>    (no locking, single RX queue, single producer)
> 4. userspace reads packets from SPSC queue (like netmap)
> 
> For step 2, the XDP program should return some identifier for the SPSC
> queue.  And step 3, is of cause a new XDP feature.

so you want 2 now while having zero code for 3 and 4 ?
Frankly I thought with all the talk about zero copy the goal was
to improve networking performance of VM traffic, but above sounds
like that you want to build a new kernel bypass (like netmap).
In such case there is no need for bpf or xdp at all.
Building kernel bypass looks like separatere problem to me with
its own headaches. We can certainly discuss it, but let's keep
xdp out of the picture then, since the goals are not aligned.

> > - should we allow different programs to attach to different rings?
> 
> Yes, that is one of my main points.  You assume that a single program
> own the entire NIC.  John's proposal was that he can create 1000's of
> queues, and wanted to bind this to (e.g. 1000) different applications.

reserving rx queues for different applications is yet another very
different problem. I think it would quite cool to do that, but
I don't see how it's related to what we want to achieve with xdp.