netdev - Re: [PATCHv9 bpf-next 2/4] xdp: extend xdp_redirect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210424090129.1b8fe377@carbon>
Date:   Sat, 24 Apr 2021 09:01:29 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Hangbin Liu <liuhangbin@...il.com>
Cc:     Toke Høiland-Jørgensen <toke@...hat.com>,
        bpf@...r.kernel.org, netdev@...r.kernel.org,
        Jiri Benc <jbenc@...hat.com>,
        Eelco Chaudron <echaudro@...hat.com>, ast@...nel.org,
        Daniel Borkmann <daniel@...earbox.net>,
        Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
        David Ahern <dsahern@...il.com>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Björn Töpel 
        <bjorn.topel@...il.com>, Martin KaFai Lau <kafai@...com>,
        brouer@...hat.com
Subject: Re: [PATCHv9 bpf-next 2/4] xdp: extend xdp_redirect_map with
 broadcast support

On Sat, 24 Apr 2021 09:09:25 +0800
Hangbin Liu <liuhangbin@...il.com> wrote:

> On Fri, Apr 23, 2021 at 06:54:29PM +0200, Jesper Dangaard Brouer wrote:
> > On Thu, 22 Apr 2021 20:02:18 +0200
> > Toke Høiland-Jørgensen <toke@...hat.com> wrote:
> >   
> > > Jesper Dangaard Brouer <brouer@...hat.com> writes:
> > >   
> > > > On Thu, 22 Apr 2021 15:14:52 +0800
> > > > Hangbin Liu <liuhangbin@...il.com> wrote:
> > > >    
> > > >> diff --git a/net/core/filter.c b/net/core/filter.c
> > > >> index cae56d08a670..afec192c3b21 100644
> > > >> --- a/net/core/filter.c
> > > >> +++ b/net/core/filter.c    
> > > > [...]    
> > > >>  int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > > >>  		    struct bpf_prog *xdp_prog)
> > > >>  {
> > > >> @@ -3933,6 +3950,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > > >>  	enum bpf_map_type map_type = ri->map_type;
> > > >>  	void *fwd = ri->tgt_value;
> > > >>  	u32 map_id = ri->map_id;
> > > >> +	struct bpf_map *map;
> > > >>  	int err;
> > > >>  
> > > >>  	ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
> > > >> @@ -3942,7 +3960,12 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > > >>  	case BPF_MAP_TYPE_DEVMAP:
> > > >>  		fallthrough;
> > > >>  	case BPF_MAP_TYPE_DEVMAP_HASH:
> > > >> -		err = dev_map_enqueue(fwd, xdp, dev);
> > > >> +		map = xchg(&ri->map, NULL);    
> > > >
> > > > Hmm, this looks dangerous for performance to have on this fast-path.
> > > > The xchg call can be expensive, AFAIK this is an atomic operation.    
> > > 
> > > Ugh, you're right. That's my bad, I suggested replacing the
> > > READ_ONCE()/WRITE_ONCE() pair with the xchg() because an exchange is
> > > what it's doing, but I failed to consider the performance implications
> > > of the atomic operation. Sorry about that, Hangbin! I guess this should
> > > be changed to:
> > > 
> > > +		map = READ_ONCE(ri->map);
> > > +		if (map) {
> > > +			WRITE_ONCE(ri->map, NULL);
> > > +			err = dev_map_enqueue_multi(xdp, dev, map,
> > > +						    ri->flags & BPF_F_EXCLUDE_INGRESS);
> > > +		} else {
> > > +			err = dev_map_enqueue(fwd, xdp, dev);
> > > +		}  
> > 
> > This is highly sensitive fast-path code, as you saw Bjørn have been
> > hunting nanosec in this area.  The above code implicitly have "map" as
> > the likely option, which I don't think it is.  
> 
> Hi Jesper,
> 
> From the performance data, there is only a slightly impact. Do we still need
> to block the whole patch on this? Or if you have a better solution?

I'm basically just asking you to add an unlikely() annotation:

	map = READ_ONCE(ri->map);
	if (unlikely(map)) {
		WRITE_ONCE(ri->map, NULL);
		err = dev_map_enqueue_multi(xdp, dev, map, [...]

For XDP, performance is the single most important factor!  You say your
performance data, there is only a slightly impact, there must be ZERO
impact (when your added features is not in use).

You data:
 Version          | Test                                | Generic | Native
 5.12 rc4         | redirect_map        i40e->i40e      |    1.9M |  9.6M
 5.12 rc4 + patch | redirect_map        i40e->i40e      |    1.9M |  9.3M

The performance difference 9.6M -> 9.3M is a slowdown of 3.36 nanosec.
Bjørn and others have been working really hard to optimize the code and
remove down to 1.5 nanosec overheads.  Thus, introducing 3.36 nanosec
added overhead to the fast-path is significant.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer