netdev - Re: [RFC PATCH 00/24] Introducing AF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALDO+SZcxks4xF-YZEJe3dL2sp9wR7kWYCnAnokhr-y3f9-AeQ@mail.gmail.com>
Date:   Mon, 26 Mar 2018 14:58:02 -0700
From:   William Tu <u9012063@...il.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Björn Töpel <bjorn.topel@...il.com>,
        magnus.karlsson@...el.com,
        Alexander Duyck <alexander.h.duyck@...el.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        Alexei Starovoitov <ast@...com>,
        willemdebruijn.kernel@...il.com,
        Daniel Borkmann <daniel@...earbox.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Björn Töpel <bjorn.topel@...el.com>,
        michael.lundkvist@...csson.com, jesse.brandeburg@...el.com,
        anjali.singhai@...el.com, jeffrey.b.shaw@...el.com,
        ferruh.yigit@...el.com, qi.z.zhang@...el.com
Subject: Re: [RFC PATCH 00/24] Introducing AF_XDP support

Hi Jesper,

Thanks a lot for your prompt reply.

>> Hi,
>> I also did an evaluation of AF_XDP, however the performance isn't as
>> good as above.
>> I'd like to share the result and see if there are some tuning suggestions.
>>
>> System:
>> 16 core, Intel(R) Xeon(R) CPU E5-2440 v2 @ 1.90GHz
>> Intel 10G X540-AT2 ---> so I can only run XDP_SKB mode
>
> Hmmm, why is X540-AT2 not able to use XDP natively?

Because I'm only able to use ixgbe driver for this NIC,
and AF_XDP patch only has i40e support?

>
>> AF_XDP performance:
>> Benchmark   XDP_SKB
>> rxdrop      1.27 Mpps
>> txpush      0.99 Mpps
>> l2fwd        0.85 Mpps
>
> Definitely too low...
>
I did another run, the rxdrop seems better.
Benchmark   XDP_SKB
rxdrop      2.3 Mpps
txpush     1.05 Mpps
l2fwd        0.90 Mpps

> What is the performance if you drop packets via iptables?
>
> Command:
>  $ iptables -t raw -I PREROUTING -p udp --dport 9 --j DROP
>
I did
# iptables -t raw -I PREROUTING -p udp -i enp10s0f0 -j DROP
# iptables -nvL -t raw; sleep 10; iptables -nvL -t raw

and I got 2.9Mpps.

>> NIC configuration:
>> the command
>> "ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 action 16"
>> doesn't work on my ixgbe driver, so I use ntuple:
>>
>> ethtool -K enp10s0f0 ntuple on
>> ethtool -U enp10s0f0 flow-type udp4 src-ip 10.1.1.100 action 1
>> then
>> echo 1 > /proc/sys/net/core/bpf_jit_enable
>> ./xdpsock -i enp10s0f0 -r -S --queue=1
>>
>> I also take a look at perf result:
>> For rxdrop:
>> 86.56%  xdpsock xdpsock           [.] main
>>   9.22%  xdpsock  [kernel.vmlinux]  [k] nmi
>>   4.23%  xdpsock  xdpsock         [.] xq_enq
>
> It looks very strange that you see non-maskable interrupt's (NMI) being
> this high...
>
yes, that's weird. Looking at the perf annotate of nmi,
it shows 100% spent on nop instruction.

>
>> For l2fwd:
>>  20.81%  xdpsock xdpsock             [.] main
>>  10.64%  xdpsock [kernel.vmlinux]    [k] clflush_cache_range
>
> Oh, clflush_cache_range is being called!

I though clflush_cache_range is high because we have many smp_rmb, smp_wmb
in the xdpsock queue/ring management userspace code.
(perf shows that 75% of this 10.64% spent on mfence instruction.)

> Do your system use an IOMMU ?
>
Yes.
With CONFIG_INTEL_IOMMU=y
and I saw some related functions called (ex: intel_alloc_iova).

>>   8.46%  xdpsock  [kernel.vmlinux]    [k] xsk_sendmsg
>>   6.72%  xdpsock  [kernel.vmlinux]    [k] skb_set_owner_w
>>   5.89%  xdpsock  [kernel.vmlinux]    [k] __domain_mapping
>>   5.74%  xdpsock  [kernel.vmlinux]    [k] alloc_skb_with_frags
>>   4.62%  xdpsock  [kernel.vmlinux]    [k] netif_skb_features
>>   3.96%  xdpsock  [kernel.vmlinux]    [k] ___slab_alloc
>>   3.18%  xdpsock  [kernel.vmlinux]    [k] nmi
>
> Again high count for NMI ?!?
>
> Maybe you just forgot to tell perf that you want it to decode the
> bpf_prog correctly?
>
> https://prototype-kernel.readthedocs.io/en/latest/bpf/troubleshooting.html#perf-tool-symbols
>
> Enable via:
>  $ sysctl net/core/bpf_jit_kallsyms=1
>
> And use perf report (while BPF is STILL LOADED):
>
>  $ perf report --kallsyms=/proc/kallsyms
>
> E.g. for emailing this you can use this command:
>
>  $ perf report --sort cpu,comm,dso,symbol --kallsyms=/proc/kallsyms --no-children --stdio -g none | head -n 40
>

Thanks, I followed the steps, the result of l2fwd
# Total Lost Samples: 119
#
# Samples: 2K of event 'cycles:ppp'
# Event count (approx.): 25675705627
#
# Overhead  CPU  Command  Shared Object       Symbol
# ........  ...  .......  ..................  ..................................
#
    10.48%  013  xdpsock  xdpsock             [.] main
     9.77%  013  xdpsock  [kernel.vmlinux]    [k] clflush_cache_range
     8.45%  013  xdpsock  [kernel.vmlinux]    [k] nmi
     8.07%  013  xdpsock  [kernel.vmlinux]    [k] xsk_sendmsg
     7.81%  013  xdpsock  [kernel.vmlinux]    [k] __domain_mapping
     4.95%  013  xdpsock  [kernel.vmlinux]    [k] ixgbe_xmit_frame_ring
     4.66%  013  xdpsock  [kernel.vmlinux]    [k] skb_store_bits
     4.39%  013  xdpsock  [kernel.vmlinux]    [k] syscall_return_via_sysret
     3.93%  013  xdpsock  [kernel.vmlinux]    [k] pfn_to_dma_pte
     2.62%  013  xdpsock  [kernel.vmlinux]    [k] __intel_map_single
     2.53%  013  xdpsock  [kernel.vmlinux]    [k] __alloc_skb
     2.36%  013  xdpsock  [kernel.vmlinux]    [k] iommu_no_mapping
     2.21%  013  xdpsock  [kernel.vmlinux]    [k] alloc_skb_with_frags
     2.07%  013  xdpsock  [kernel.vmlinux]    [k] skb_set_owner_w
     1.98%  013  xdpsock  [kernel.vmlinux]    [k] __kmalloc_node_track_caller
     1.94%  013  xdpsock  [kernel.vmlinux]    [k] ksize
     1.84%  013  xdpsock  [kernel.vmlinux]    [k] validate_xmit_skb_list
     1.62%  013  xdpsock  [kernel.vmlinux]    [k] kmem_cache_alloc_node
     1.48%  013  xdpsock  [kernel.vmlinux]    [k] __kmalloc_reserve.isra.37
     1.21%  013  xdpsock  xdpsock             [.] xq_enq
     1.08%  013  xdpsock  [kernel.vmlinux]    [k] intel_alloc_iova

And l2fwd under "perf stat" looks OK to me. There is little context
switches, cpu
is fully utilized, 1.17 insn per cycle seems ok.

Performance counter stats for 'CPU(s) 6':
      10000.787420      cpu-clock (msec)          #    1.000 CPUs
utilized
                24      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                 0      page-faults               #    0.000 K/sec
    22,361,333,647      cycles                    #    2.236 GHz
    13,458,442,838      stalled-cycles-frontend   #   60.19% frontend
cycles idle
    26,251,003,067      instructions              #    1.17  insn per
cycle
                                                  #    0.51  stalled
cycles per insn
     4,938,921,868      branches                  #  493.853 M/sec
         7,591,739      branch-misses             #    0.15% of all
branches
      10.000835769 seconds time elapsed

Will continue investigate...
Thanks
William