netdev - Re: [PATCH bpf-next 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191218134001.319349bc@carbon>
Date:   Wed, 18 Dec 2019 13:40:01 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Björn Töpel <bjorn.topel@...il.com>
Cc:     Netdev <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        bpf <bpf@...r.kernel.org>, David Miller <davem@...emloft.net>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Maciej Fijalkowski <maciejromanfijalkowski@...il.com>,
        brouer@...hat.com
Subject: Re: [PATCH bpf-next 0/8] Simplify
 xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps

On Wed, 18 Dec 2019 13:18:10 +0100
Björn Töpel <bjorn.topel@...il.com> wrote:

> On Wed, 18 Dec 2019 at 13:04, Jesper Dangaard Brouer <brouer@...hat.com> wrote:
> >
> > On Wed, 18 Dec 2019 12:39:53 +0100
> > Björn Töpel <bjorn.topel@...il.com> wrote:
> >  
> > > On Wed, 18 Dec 2019 at 12:11, Jesper Dangaard Brouer <brouer@...hat.com> wrote:  
> > > >
> > > > On Wed, 18 Dec 2019 11:53:52 +0100
> > > > Björn Töpel <bjorn.topel@...il.com> wrote:
> > > >  
> > > > >   $ sudo ./xdp_redirect_cpu --dev enp134s0f0 --cpu 22 xdp_cpu_map0
> > > > >
> > > > >   Running XDP/eBPF prog_name:xdp_cpu_map5_lb_hash_ip_pairs
> > > > >   XDP-cpumap      CPU:to  pps            drop-pps    extra-info
> > > > >   XDP-RX          20      7723038        0           0
> > > > >   XDP-RX          total   7723038        0
> > > > >   cpumap_kthread  total   0              0           0
> > > > >   redirect_err    total   0              0
> > > > >   xdp_exception   total   0              0  
> > > >
> > > > Hmm... I'm missing some counters on the kthread side.
> > > >  
> > >
> > > Oh? Any ideas why? I just ran the upstream sample straight off.  
> >
> > Looks like it happened in commit: bbaf6029c49c ("samples/bpf: Convert
> > XDP samples to libbpf usage") (Cc Maciej).
> >
> > The old bpf_load.c will auto attach the tracepoints... for and libbpf
> > you have to be explicit about it.
> >
> > Can I ask you to also run a test with --stress-mode for
> > ./xdp_redirect_cpu, to flush out any potential RCU race-conditions
> > (don't provide output, this is just a robustness test).
> >  
> 
> Sure! Other than that, does the command line above make sense? I'm
> blasting UDP packets to core 20, and the idea was to re-route them to
> 22.

Yes, and I love that you are using CPUMAP xdp_redirect_cpu as a test.

Explaining what is doing on (so you can say if this is what you wanted
to test):

The "XDP-RX" number is the raw XDP redirect number, but the remote CPU,
where the network stack is started, cannot operate at 7.7Mpps.  Which the
lacking tracepoint numbers should have shown. You still can observe
results via nstat, e.g.:

 # nstat -n && sleep 1 && nstat

On the remote CPU 22, the SKB will be constructed, and likely dropped
due overloading network stack and due to not having an UDP listen port.

I sometimes use:
 # iptables -t raw -I PREROUTING -p udp --dport 9 -j DROP
To drop the UDP packets in a earlier and consistent stage.

The CPUMAP have carefully been designed to avoid that a "producer" can
be slowed down by memory operations done by the "consumer", this is
mostly achieved via ptr_ring and careful bulking (cache-lines).  As
your driver i40e doesn't have 'page_pool', then you are not affected by
the return channel.

Funny test/details: i40e uses a refcnt recycle scheme, based off the
size of the RX-ring, thus it is affected by a longer outstanding queue.
The CPUMAP have an intermediate queue, that will be full in this
overload setting.  Try to increase or decrease the parameter --qsize
(remember to place it as first argument), and see if this was the
limiting factor for your XDP-RX number.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer