netdev - Re: [PATCHv17 bpf-next 1/6] bpf: run devmap xdp_prog on flush instead of bulk enqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <60118c586000d_9913c208c2@john-XPS-13-9370.notmuch>
Date:   Wed, 27 Jan 2021 07:52:56 -0800
From:   John Fastabend <john.fastabend@...il.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc:     Toke Høiland-Jørgensen <toke@...hat.com>,
        John Fastabend <john.fastabend@...il.com>,
        Hangbin Liu <liuhangbin@...il.com>, bpf@...r.kernel.org,
        netdev@...r.kernel.org, Jiri Benc <jbenc@...hat.com>,
        Eelco Chaudron <echaudro@...hat.com>, ast@...nel.org,
        Daniel Borkmann <daniel@...earbox.net>,
        Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
        David Ahern <dsahern@...il.com>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        brouer@...hat.com
Subject: Re: [PATCHv17 bpf-next 1/6] bpf: run devmap xdp_prog on flush instead
 of bulk enqueue

Jesper Dangaard Brouer wrote:
> On Wed, 27 Jan 2021 13:20:50 +0100
> Maciej Fijalkowski <maciej.fijalkowski@...el.com> wrote:
> 
> > On Wed, Jan 27, 2021 at 10:41:44AM +0100, Toke Høiland-Jørgensen wrote:
> > > John Fastabend <john.fastabend@...il.com> writes:
> > >   
> > > > Hangbin Liu wrote:  
> > > >> From: Jesper Dangaard Brouer <brouer@...hat.com>
> > > >> 
> > > >> This changes the devmap XDP program support to run the program when the
> > > >> bulk queue is flushed instead of before the frame is enqueued. This has
> > > >> a couple of benefits:
> > > >> 
> > > >> - It "sorts" the packets by destination devmap entry, and then runs the
> > > >>   same BPF program on all the packets in sequence. This ensures that we
> > > >>   keep the XDP program and destination device properties hot in I-cache.
> > > >> 
> > > >> - It makes the multicast implementation simpler because it can just
> > > >>   enqueue packets using bq_enqueue() without having to deal with the
> > > >>   devmap program at all.
> > > >> 
> > > >> The drawback is that if the devmap program drops the packet, the enqueue
> > > >> step is redundant. However, arguably this is mostly visible in a
> > > >> micro-benchmark, and with more mixed traffic the I-cache benefit should
> > > >> win out. The performance impact of just this patch is as follows:
> > > >> 
> > > >> The bq_xmit_all's logic is also refactored and error label is removed.
> > > >> When bq_xmit_all() is called from bq_enqueue(), another packet will
> > > >> always be enqueued immediately after, so clearing dev_rx, xdp_prog and
> > > >> flush_node in bq_xmit_all() is redundant. Let's move the clear to
> > > >> __dev_flush(), and only check them once in bq_enqueue() since they are
> > > >> all modified together.
> > > >> 
> > > >> By using xdp_redirect_map in sample/bpf and send pkts via pktgen cmd:
> > > >> ./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64
> > > >> 
> > > >> There are about +/- 0.1M deviation for native testing, the performance
> > > >> improved for the base-case, but some drop back with xdp devmap prog attached.
> > > >> 
> > > >> Version          | Test                           | Generic | Native | Native + 2nd xdp_prog
> > > >> 5.10 rc6         | xdp_redirect_map   i40e->i40e  |    2.0M |   9.1M |  8.0M
> > > >> 5.10 rc6         | xdp_redirect_map   i40e->veth  |    1.7M |  11.0M |  9.7M
> > > >> 5.10 rc6 + patch | xdp_redirect_map   i40e->i40e  |    2.0M |   9.5M |  7.5M
> > > >> 5.10 rc6 + patch | xdp_redirect_map   i40e->veth  |    1.7M |  11.6M |  9.1M
> > > >>   
> > > >
> > > > [...]

Acked-by: John Fastabend <john.fastabend@...il.com>

> > > >>  static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
> > > >>  {
> > > >>  	struct net_device *dev = bq->dev;
> > > >> -	int sent = 0, drops = 0, err = 0;
> > > >> +	unsigned int cnt = bq->count;
> > > >> +	int drops = 0, err = 0;
> > > >> +	int to_send = cnt;
> > > >> +	int sent = cnt;
> > > >>  	int i;
> > > >>  
> > > >> -	if (unlikely(!bq->count))
> > > >> +	if (unlikely(!cnt))
> > > >>  		return;
> > > >>  
> > > >> -	for (i = 0; i < bq->count; i++) {
> > > >> +	for (i = 0; i < cnt; i++) {
> > > >>  		struct xdp_frame *xdpf = bq->q[i];
> > > >>  
> > > >>  		prefetch(xdpf);
> > > >>  	}
> > > >>  
> > > >> -	sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags);
> > > >> +	if (bq->xdp_prog) {
> > > >> +		to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev);
> > > >> +		if (!to_send) {
> > > >> +			sent = 0;
> > > >> +			goto out;
> > > >> +		}
> > > >> +		drops = cnt - to_send;
> > > >> +	}  
> > > >
> > > > I might be missing something about how *bq works here. What happens when
> > > > dev_map_bpf_prog_run returns to_send < cnt?
> > > >
> > > > So I read this as it will send [0, to_send] and [to_send, cnt] will be
> > > > dropped? How do we know the bpf prog would have dropped the set,
> > > > [to_send+1, cnt]?  
> > 
> > You know that via recalculation of 'drops' value after you returned from
> > dev_map_bpf_prog_run() which later on is provided onto trace_xdp_devmap_xmit.
> > 
> > > 
> > > Because dev_map_bpf_prog_run() compacts the array:
> > > 
> > > +		case XDP_PASS:
> > > +			err = xdp_update_frame_from_buff(&xdp, xdpf);
> > > +			if (unlikely(err < 0))
> > > +				xdp_return_frame_rx_napi(xdpf);
> > > +			else
> > > +				frames[nframes++] = xdpf;
> > > +			break;  
> > 
> > To expand this a little, 'frames' array is reused and 'nframes' above is
> > the value that is returned and we store it onto 'to_send' variable.
> > 

In the morning with coffee looks good to me. Thanks Toke, Jesper.