netdev - Re: [PATCH v2 bpf-next] cpumap: bulk skb using netif_receive_skb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210415172148.4f1e2440@carbon>
Date:   Thu, 15 Apr 2021 17:21:48 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Lorenzo Bianconi <lorenzo@...nel.org>, bpf@...r.kernel.org,
        netdev@...r.kernel.org, lorenzo.bianconi@...hat.com,
        davem@...emloft.net, kuba@...nel.org, ast@...nel.org,
        song@...nel.org, brouer@...hat.com
Subject: Re: [PATCH v2 bpf-next] cpumap: bulk skb using
 netif_receive_skb_list

On Thu, 15 Apr 2021 17:05:36 +0200
Daniel Borkmann <daniel@...earbox.net> wrote:

> On 4/13/21 6:22 PM, Lorenzo Bianconi wrote:
> > Rely on netif_receive_skb_list routine to send skbs converted from
> > xdp_frames in cpu_map_kthread_run in order to improve i-cache usage.
> > The proposed patch has been tested running xdp_redirect_cpu bpf sample
> > available in the kernel tree that is used to redirect UDP frames from
> > ixgbe driver to a cpumap entry and then to the networking stack.
> > UDP frames are generated using pkt_gen.
> > 
> > $xdp_redirect_cpu  --cpu <cpu> --progname xdp_cpu_map0 --dev <eth>
> > 
> > bpf-next: ~2.2Mpps
> > bpf-next + cpumap skb-list: ~3.15Mpps
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@...nel.org>
> > ---
> > Changes since v1:
> > - fixed comment
> > - rebased on top of bpf-next tree
> > ---
> >   kernel/bpf/cpumap.c | 11 +++++------
> >   1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> > index 0cf2791d5099..d89551a508b2 100644
> > --- a/kernel/bpf/cpumap.c
> > +++ b/kernel/bpf/cpumap.c
> > @@ -27,7 +27,7 @@
> >   #include <linux/capability.h>
> >   #include <trace/events/xdp.h>
> >   
> > -#include <linux/netdevice.h>   /* netif_receive_skb_core */
> > +#include <linux/netdevice.h>   /* netif_receive_skb_list */
> >   #include <linux/etherdevice.h> /* eth_type_trans */
> >   
> >   /* General idea: XDP packets getting XDP redirected to another CPU,
> > @@ -257,6 +257,7 @@ static int cpu_map_kthread_run(void *data)
> >   		void *frames[CPUMAP_BATCH];
> >   		void *skbs[CPUMAP_BATCH];
> >   		int i, n, m, nframes;
> > +		LIST_HEAD(list);
> >   
> >   		/* Release CPU reschedule checks */
> >   		if (__ptr_ring_empty(rcpu->queue)) {
> > @@ -305,7 +306,6 @@ static int cpu_map_kthread_run(void *data)
> >   		for (i = 0; i < nframes; i++) {
> >   			struct xdp_frame *xdpf = frames[i];
> >   			struct sk_buff *skb = skbs[i];
> > -			int ret;
> >   
> >   			skb = __xdp_build_skb_from_frame(xdpf, skb,
> >   							 xdpf->dev_rx);
> > @@ -314,11 +314,10 @@ static int cpu_map_kthread_run(void *data)
> >   				continue;
> >   			}
> >   
> > -			/* Inject into network stack */
> > -			ret = netif_receive_skb_core(skb);
> > -			if (ret == NET_RX_DROP)
> > -				drops++;
> > +			list_add_tail(&skb->list, &list);
> >   		}
> > +		netif_receive_skb_list(&list);
> > +
> >   		/* Feedback loop via tracepoint */
> >   		trace_xdp_cpumap_kthread(rcpu->map_id, n, drops, sched, &stats);  
> 
> Given we stop counting drops with the netif_receive_skb_list(), we should then
> also remove drops from trace_xdp_cpumap_kthread(), imho, as otherwise it is rather
> misleading (as in: drops actually happening, but 0 are shown from the tracepoint).
> Given they are not considered stable API, I would just remove those to make it clear
> to users that they cannot rely on this counter anymore anyway.

After Lorenzo's change, the 'drops' still count if kmem_cache_alloc_bulk
cannot alloc SKBs.  I guess that will not occur very often.  But how
can people/users debug such a case?  Maybe the MM-layer can tell us?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer