netdev - Re: [RFC PATCH 00/24] Introducing AF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ+HfNj=wOarZSow5pJ42FrvUDd9LGztRxBMJZWainu61Dnamg@mail.gmail.com>
Date:   Tue, 27 Mar 2018 08:09:53 +0200
From:   Björn Töpel <bjorn.topel@...il.com>
To:     William Tu <u9012063@...il.com>
Cc:     Jesper Dangaard Brouer <brouer@...hat.com>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Alexander Duyck <alexander.h.duyck@...el.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        Alexei Starovoitov <ast@...com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Björn Töpel <bjorn.topel@...el.com>,
        michael.lundkvist@...csson.com,
        "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
        "Singhai, Anjali" <anjali.singhai@...el.com>,
        "Shaw, Jeffrey B" <jeffrey.b.shaw@...el.com>,
        "Yigit, Ferruh" <ferruh.yigit@...el.com>,
        "Zhang, Qi Z" <qi.z.zhang@...el.com>
Subject: Re: [RFC PATCH 00/24] Introducing AF_XDP support

 2018-03-26 23:58 GMT+02:00 William Tu <u9012063@...il.com>:
> Hi Jesper,
>
> Thanks a lot for your prompt reply.
>
>>> Hi,
>>> I also did an evaluation of AF_XDP, however the performance isn't as
>>> good as above.
>>> I'd like to share the result and see if there are some tuning suggestions.
>>>
>>> System:
>>> 16 core, Intel(R) Xeon(R) CPU E5-2440 v2 @ 1.90GHz
>>> Intel 10G X540-AT2 ---> so I can only run XDP_SKB mode
>>
>> Hmmm, why is X540-AT2 not able to use XDP natively?
>
> Because I'm only able to use ixgbe driver for this NIC,
> and AF_XDP patch only has i40e support?
>

It's only i40e that support zero copy. As for native XDP support, only
XDP_REDIRECT support is required and ixgbe does support XDP_REDIRECT
-- unfortunately, ixgbe still needs a needs a patch to work properly,
which is in net-next: ed93a3987128 ("ixgbe: tweak page counting for
XDP_REDIRECT").

>>
>>> AF_XDP performance:
>>> Benchmark   XDP_SKB
>>> rxdrop      1.27 Mpps
>>> txpush      0.99 Mpps
>>> l2fwd        0.85 Mpps
>>
>> Definitely too low...
>>
> I did another run, the rxdrop seems better.
> Benchmark   XDP_SKB
> rxdrop      2.3 Mpps
> txpush     1.05 Mpps
> l2fwd        0.90 Mpps
>
>> What is the performance if you drop packets via iptables?
>>
>> Command:
>>  $ iptables -t raw -I PREROUTING -p udp --dport 9 --j DROP
>>
> I did
> # iptables -t raw -I PREROUTING -p udp -i enp10s0f0 -j DROP
> # iptables -nvL -t raw; sleep 10; iptables -nvL -t raw
>
> and I got 2.9Mpps.
>
>>> NIC configuration:
>>> the command
>>> "ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 action 16"
>>> doesn't work on my ixgbe driver, so I use ntuple:
>>>
>>> ethtool -K enp10s0f0 ntuple on
>>> ethtool -U enp10s0f0 flow-type udp4 src-ip 10.1.1.100 action 1
>>> then
>>> echo 1 > /proc/sys/net/core/bpf_jit_enable
>>> ./xdpsock -i enp10s0f0 -r -S --queue=1
>>>
>>> I also take a look at perf result:
>>> For rxdrop:
>>> 86.56%  xdpsock xdpsock           [.] main
>>>   9.22%  xdpsock  [kernel.vmlinux]  [k] nmi
>>>   4.23%  xdpsock  xdpsock         [.] xq_enq
>>
>> It looks very strange that you see non-maskable interrupt's (NMI) being
>> this high...
>>
> yes, that's weird. Looking at the perf annotate of nmi,
> it shows 100% spent on nop instruction.
>
>>
>>> For l2fwd:
>>>  20.81%  xdpsock xdpsock             [.] main
>>>  10.64%  xdpsock [kernel.vmlinux]    [k] clflush_cache_range
>>
>> Oh, clflush_cache_range is being called!
>
> I though clflush_cache_range is high because we have many smp_rmb, smp_wmb
> in the xdpsock queue/ring management userspace code.
> (perf shows that 75% of this 10.64% spent on mfence instruction.)
>
>> Do your system use an IOMMU ?
>>
> Yes.
> With CONFIG_INTEL_IOMMU=y
> and I saw some related functions called (ex: intel_alloc_iova).
>
>>>   8.46%  xdpsock  [kernel.vmlinux]    [k] xsk_sendmsg
>>>   6.72%  xdpsock  [kernel.vmlinux]    [k] skb_set_owner_w
>>>   5.89%  xdpsock  [kernel.vmlinux]    [k] __domain_mapping
>>>   5.74%  xdpsock  [kernel.vmlinux]    [k] alloc_skb_with_frags
>>>   4.62%  xdpsock  [kernel.vmlinux]    [k] netif_skb_features
>>>   3.96%  xdpsock  [kernel.vmlinux]    [k] ___slab_alloc
>>>   3.18%  xdpsock  [kernel.vmlinux]    [k] nmi
>>
>> Again high count for NMI ?!?
>>
>> Maybe you just forgot to tell perf that you want it to decode the
>> bpf_prog correctly?
>>
>> https://prototype-kernel.readthedocs.io/en/latest/bpf/troubleshooting.html#perf-tool-symbols
>>
>> Enable via:
>>  $ sysctl net/core/bpf_jit_kallsyms=1
>>
>> And use perf report (while BPF is STILL LOADED):
>>
>>  $ perf report --kallsyms=/proc/kallsyms
>>
>> E.g. for emailing this you can use this command:
>>
>>  $ perf report --sort cpu,comm,dso,symbol --kallsyms=/proc/kallsyms --no-children --stdio -g none | head -n 40
>>
>
> Thanks, I followed the steps, the result of l2fwd
> # Total Lost Samples: 119
> #
> # Samples: 2K of event 'cycles:ppp'
> # Event count (approx.): 25675705627
> #
> # Overhead  CPU  Command  Shared Object       Symbol
> # ........  ...  .......  ..................  ..................................
> #
>     10.48%  013  xdpsock  xdpsock             [.] main
>      9.77%  013  xdpsock  [kernel.vmlinux]    [k] clflush_cache_range
>      8.45%  013  xdpsock  [kernel.vmlinux]    [k] nmi
>      8.07%  013  xdpsock  [kernel.vmlinux]    [k] xsk_sendmsg
>      7.81%  013  xdpsock  [kernel.vmlinux]    [k] __domain_mapping
>      4.95%  013  xdpsock  [kernel.vmlinux]    [k] ixgbe_xmit_frame_ring
>      4.66%  013  xdpsock  [kernel.vmlinux]    [k] skb_store_bits
>      4.39%  013  xdpsock  [kernel.vmlinux]    [k] syscall_return_via_sysret
>      3.93%  013  xdpsock  [kernel.vmlinux]    [k] pfn_to_dma_pte
>      2.62%  013  xdpsock  [kernel.vmlinux]    [k] __intel_map_single
>      2.53%  013  xdpsock  [kernel.vmlinux]    [k] __alloc_skb
>      2.36%  013  xdpsock  [kernel.vmlinux]    [k] iommu_no_mapping
>      2.21%  013  xdpsock  [kernel.vmlinux]    [k] alloc_skb_with_frags
>      2.07%  013  xdpsock  [kernel.vmlinux]    [k] skb_set_owner_w
>      1.98%  013  xdpsock  [kernel.vmlinux]    [k] __kmalloc_node_track_caller
>      1.94%  013  xdpsock  [kernel.vmlinux]    [k] ksize
>      1.84%  013  xdpsock  [kernel.vmlinux]    [k] validate_xmit_skb_list
>      1.62%  013  xdpsock  [kernel.vmlinux]    [k] kmem_cache_alloc_node
>      1.48%  013  xdpsock  [kernel.vmlinux]    [k] __kmalloc_reserve.isra.37
>      1.21%  013  xdpsock  xdpsock             [.] xq_enq
>      1.08%  013  xdpsock  [kernel.vmlinux]    [k] intel_alloc_iova
>
> And l2fwd under "perf stat" looks OK to me. There is little context
> switches, cpu
> is fully utilized, 1.17 insn per cycle seems ok.
>
> Performance counter stats for 'CPU(s) 6':
>       10000.787420      cpu-clock (msec)          #    1.000 CPUs
> utilized
>                 24      context-switches          #    0.002 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                  0      page-faults               #    0.000 K/sec
>     22,361,333,647      cycles                    #    2.236 GHz
>     13,458,442,838      stalled-cycles-frontend   #   60.19% frontend
> cycles idle
>     26,251,003,067      instructions              #    1.17  insn per
> cycle
>                                                   #    0.51  stalled
> cycles per insn
>      4,938,921,868      branches                  #  493.853 M/sec
>          7,591,739      branch-misses             #    0.15% of all
> branches
>       10.000835769 seconds time elapsed
>
> Will continue investigate...
> Thanks
> William