lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 15 Aug 2019 19:59:11 +0900
From:   Toshiaki Makita <toshiaki.makita1@...il.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org,
        bpf@...r.kernel.org, William Tu <u9012063@...il.com>
Subject: Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP

On 2019/08/14 16:33, Toshiaki Makita wrote:
>>>    bpf, hashtab: Compare keys in long
>>
>> 3Mpps vs 4Mpps just from this patch ?
>> or combined with i40 prefech patch ?
> 
> Combined.
> 
>>>   drivers/net/ethernet/intel/i40e/i40e_txrx.c  |    1 +
>>
>> Could you share "perf report" for just hash tab optimization
>> and for i40 ?
> 
> Sure, I'll get some more data and post them.

Here are perf report and performance numbers.
This time for some reason the performance is better than before.
Something in my env may have changed but could not identify that.

I cut and paste top 10 functions from perf report with drop rate for each case.
perf report is run with --no-child option, so does not include child functions load.
It looks like the hottest function is always xdp_flow BPF program for XDP,
but the shown function name is some meaningless one, like __this_module+0x800000007446.

- No prefetch, no long-compare

   3.3 Mpps

     25.22%  ksoftirqd/4      [kernel.kallsyms]             [k] __this_module+0x800000007446
     21.64%  ksoftirqd/4      [kernel.kallsyms]             [k] __htab_map_lookup_elem
     14.93%  ksoftirqd/4      [kernel.kallsyms]             [k] memcmp
      7.07%  ksoftirqd/4      [kernel.kallsyms]             [k] i40e_clean_rx_irq
      4.57%  ksoftirqd/4      [kernel.kallsyms]             [k] dev_map_enqueue
      3.60%  ksoftirqd/4      [kernel.kallsyms]             [k] lookup_nulls_elem_raw
      3.44%  ksoftirqd/4      [kernel.kallsyms]             [k] page_frag_free
      2.69%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_rcv
      2.29%  ksoftirqd/4      [kernel.kallsyms]             [k] xdp_do_redirect
      1.51%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_xmit

- With prefetch, no long-compare

   3.7 Mpps

     25.02%  ksoftirqd/4      [kernel.kallsyms]             [k] mirred_list_lock+0x800000008052
     21.52%  ksoftirqd/4      [kernel.kallsyms]             [k] __htab_map_lookup_elem
     13.20%  ksoftirqd/4      [kernel.kallsyms]             [k] memcmp
      7.38%  ksoftirqd/4      [kernel.kallsyms]             [k] i40e_clean_rx_irq
      4.09%  ksoftirqd/4      [kernel.kallsyms]             [k] lookup_nulls_elem_raw
      3.57%  ksoftirqd/4      [kernel.kallsyms]             [k] dev_map_enqueue
      3.50%  ksoftirqd/4      [kernel.kallsyms]             [k] page_frag_free
      2.86%  ksoftirqd/4      [kernel.kallsyms]             [k] xdp_do_redirect
      2.84%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_rcv
      1.79%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_xmit

- No prefetch, with long-compare

   4.2 Mpps

     24.64%  ksoftirqd/4      [kernel.kallsyms]             [k] __this_module+0x800000008f47
     24.42%  ksoftirqd/4      [kernel.kallsyms]             [k] __htab_map_lookup_elem
      6.91%  ksoftirqd/4      [kernel.kallsyms]             [k] i40e_clean_rx_irq
      4.04%  ksoftirqd/4      [kernel.kallsyms]             [k] page_frag_free
      3.53%  ksoftirqd/4      [kernel.kallsyms]             [k] lookup_nulls_elem_raw
      3.14%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_rcv
      3.13%  ksoftirqd/4      [kernel.kallsyms]             [k] dev_map_enqueue
      2.34%  ksoftirqd/4      [kernel.kallsyms]             [k] xdp_do_redirect
      1.76%  ksoftirqd/4      [kernel.kallsyms]             [k] key_equal
      1.37%  ksoftirqd/4      [kernel.kallsyms]             [k] zero_key+0x800000010e93

   NOTE: key_equal is called in place of memcmp.

- With prefetch, with long-compare

   4.6 Mpps

     26.68%  ksoftirqd/4      [kernel.kallsyms]             [k] mirred_list_lock+0x80000000a109
     22.37%  ksoftirqd/4      [kernel.kallsyms]             [k] __htab_map_lookup_elem
     10.79%  ksoftirqd/4      [kernel.kallsyms]             [k] i40e_clean_rx_irq
      4.74%  ksoftirqd/4      [kernel.kallsyms]             [k] page_frag_free
      4.09%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_rcv
      3.97%  ksoftirqd/4      [kernel.kallsyms]             [k] dev_map_enqueue
      3.79%  ksoftirqd/4      [kernel.kallsyms]             [k] lookup_nulls_elem_raw
      3.09%  ksoftirqd/4      [kernel.kallsyms]             [k] xdp_do_redirect
      2.45%  ksoftirqd/4      [kernel.kallsyms]             [k] key_equal
      1.91%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_xmit

Toshiaki Makita

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ