lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210424090129.1b8fe377@carbon>
Date:   Sat, 24 Apr 2021 09:01:29 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Hangbin Liu <liuhangbin@...il.com>
Cc:     Toke Høiland-Jørgensen <toke@...hat.com>,
        bpf@...r.kernel.org, netdev@...r.kernel.org,
        Jiri Benc <jbenc@...hat.com>,
        Eelco Chaudron <echaudro@...hat.com>, ast@...nel.org,
        Daniel Borkmann <daniel@...earbox.net>,
        Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
        David Ahern <dsahern@...il.com>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Björn Töpel 
        <bjorn.topel@...il.com>, Martin KaFai Lau <kafai@...com>,
        brouer@...hat.com
Subject: Re: [PATCHv9 bpf-next 2/4] xdp: extend xdp_redirect_map with
 broadcast support

On Sat, 24 Apr 2021 09:09:25 +0800
Hangbin Liu <liuhangbin@...il.com> wrote:

> On Fri, Apr 23, 2021 at 06:54:29PM +0200, Jesper Dangaard Brouer wrote:
> > On Thu, 22 Apr 2021 20:02:18 +0200
> > Toke Høiland-Jørgensen <toke@...hat.com> wrote:
> >   
> > > Jesper Dangaard Brouer <brouer@...hat.com> writes:
> > >   
> > > > On Thu, 22 Apr 2021 15:14:52 +0800
> > > > Hangbin Liu <liuhangbin@...il.com> wrote:
> > > >    
> > > >> diff --git a/net/core/filter.c b/net/core/filter.c
> > > >> index cae56d08a670..afec192c3b21 100644
> > > >> --- a/net/core/filter.c
> > > >> +++ b/net/core/filter.c    
> > > > [...]    
> > > >>  int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > > >>  		    struct bpf_prog *xdp_prog)
> > > >>  {
> > > >> @@ -3933,6 +3950,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > > >>  	enum bpf_map_type map_type = ri->map_type;
> > > >>  	void *fwd = ri->tgt_value;
> > > >>  	u32 map_id = ri->map_id;
> > > >> +	struct bpf_map *map;
> > > >>  	int err;
> > > >>  
> > > >>  	ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
> > > >> @@ -3942,7 +3960,12 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > > >>  	case BPF_MAP_TYPE_DEVMAP:
> > > >>  		fallthrough;
> > > >>  	case BPF_MAP_TYPE_DEVMAP_HASH:
> > > >> -		err = dev_map_enqueue(fwd, xdp, dev);
> > > >> +		map = xchg(&ri->map, NULL);    
> > > >
> > > > Hmm, this looks dangerous for performance to have on this fast-path.
> > > > The xchg call can be expensive, AFAIK this is an atomic operation.    
> > > 
> > > Ugh, you're right. That's my bad, I suggested replacing the
> > > READ_ONCE()/WRITE_ONCE() pair with the xchg() because an exchange is
> > > what it's doing, but I failed to consider the performance implications
> > > of the atomic operation. Sorry about that, Hangbin! I guess this should
> > > be changed to:
> > > 
> > > +		map = READ_ONCE(ri->map);
> > > +		if (map) {
> > > +			WRITE_ONCE(ri->map, NULL);
> > > +			err = dev_map_enqueue_multi(xdp, dev, map,
> > > +						    ri->flags & BPF_F_EXCLUDE_INGRESS);
> > > +		} else {
> > > +			err = dev_map_enqueue(fwd, xdp, dev);
> > > +		}  
> > 
> > This is highly sensitive fast-path code, as you saw Bjørn have been
> > hunting nanosec in this area.  The above code implicitly have "map" as
> > the likely option, which I don't think it is.  
> 
> Hi Jesper,
> 
> From the performance data, there is only a slightly impact. Do we still need
> to block the whole patch on this? Or if you have a better solution?

I'm basically just asking you to add an unlikely() annotation:

	map = READ_ONCE(ri->map);
	if (unlikely(map)) {
		WRITE_ONCE(ri->map, NULL);
		err = dev_map_enqueue_multi(xdp, dev, map, [...]

For XDP, performance is the single most important factor!  You say your
performance data, there is only a slightly impact, there must be ZERO
impact (when your added features is not in use).

You data:
 Version          | Test                                | Generic | Native
 5.12 rc4         | redirect_map        i40e->i40e      |    1.9M |  9.6M
 5.12 rc4 + patch | redirect_map        i40e->i40e      |    1.9M |  9.3M

The performance difference 9.6M -> 9.3M is a slowdown of 3.36 nanosec.
Bjørn and others have been working really hard to optimize the code and
remove down to 1.5 nanosec overheads.  Thus, introducing 3.36 nanosec
added overhead to the fast-path is significant.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ