lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877e4zd8py.fsf@toke.dk>
Date:   Sun, 20 Oct 2019 12:14:49 +0200
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        "Samudrala\, Sridhar" <sridhar.samudrala@...el.com>
Cc:     Jakub Kicinski <jakub.kicinski@...ronome.com>,
        "Karlsson\, Magnus" <magnus.karlsson@...el.com>,
        Björn Töpel <bjorn.topel@...el.com>,
        Netdev <netdev@...r.kernel.org>,
        "bpf\@vger.kernel.org" <bpf@...r.kernel.org>,
        intel-wired-lan <intel-wired-lan@...ts.osuosl.org>,
        "Fijalkowski\, Maciej" <maciej.fijalkowski@...el.com>,
        "Herbert\, Tom" <tom.herbert@...el.com>
Subject: Re: FW: [PATCH bpf-next 2/4] xsk: allow AF_XDP sockets to receive packets directly from a queue

Alexei Starovoitov <alexei.starovoitov@...il.com> writes:

> On Fri, Oct 18, 2019 at 05:45:26PM -0700, Samudrala, Sridhar wrote:
>> On 10/18/2019 5:14 PM, Alexei Starovoitov wrote:
>> > On Fri, Oct 18, 2019 at 11:40:07AM -0700, Samudrala, Sridhar wrote:
>> > > 
>> > > Perf report for "AF_XDP default rxdrop" with patched kernel - mitigations ON
>> > > ==========================================================================
>> > > Samples: 44K of event 'cycles', Event count (approx.): 38532389541
>> > > Overhead  Command          Shared Object              Symbol
>> > >    15.31%  ksoftirqd/28     [i40e]                     [k] i40e_clean_rx_irq_zc
>> > >    10.50%  ksoftirqd/28     bpf_prog_80b55d8a76303785  [k] bpf_prog_80b55d8a76303785
>> > >     9.48%  xdpsock          [i40e]                     [k] i40e_clean_rx_irq_zc
>> > >     8.62%  xdpsock          xdpsock                    [.] main
>> > >     7.11%  ksoftirqd/28     [kernel.vmlinux]           [k] xsk_rcv
>> > >     5.81%  ksoftirqd/28     [kernel.vmlinux]           [k] xdp_do_redirect
>> > >     4.46%  xdpsock          bpf_prog_80b55d8a76303785  [k] bpf_prog_80b55d8a76303785
>> > >     3.83%  xdpsock          [kernel.vmlinux]           [k] xsk_rcv
>> > 
>> > why everything is duplicated?
>> > Same code runs in different tasks ?
>> 
>> Yes. looks like these functions run from both the app(xdpsock) context and ksoftirqd context.
>> 
>> > 
>> > >     2.81%  ksoftirqd/28     [kernel.vmlinux]           [k] bpf_xdp_redirect_map
>> > >     2.78%  ksoftirqd/28     [kernel.vmlinux]           [k] xsk_map_lookup_elem
>> > >     2.44%  xdpsock          [kernel.vmlinux]           [k] xdp_do_redirect
>> > >     2.19%  ksoftirqd/28     [kernel.vmlinux]           [k] __xsk_map_redirect
>> > >     1.62%  ksoftirqd/28     [kernel.vmlinux]           [k] xsk_umem_peek_addr
>> > >     1.57%  xdpsock          [kernel.vmlinux]           [k] xsk_umem_peek_addr
>> > >     1.32%  ksoftirqd/28     [kernel.vmlinux]           [k] dma_direct_sync_single_for_cpu
>> > >     1.28%  xdpsock          [kernel.vmlinux]           [k] bpf_xdp_redirect_map
>> > >     1.15%  xdpsock          [kernel.vmlinux]           [k] dma_direct_sync_single_for_device
>> > >     1.12%  xdpsock          [kernel.vmlinux]           [k] xsk_map_lookup_elem
>> > >     1.06%  xdpsock          [kernel.vmlinux]           [k] __xsk_map_redirect
>> > >     0.94%  ksoftirqd/28     [kernel.vmlinux]           [k] dma_direct_sync_single_for_device
>> > >     0.75%  ksoftirqd/28     [kernel.vmlinux]           [k] __x86_indirect_thunk_rax
>> > >     0.66%  ksoftirqd/28     [i40e]                     [k] i40e_clean_programming_status
>> > >     0.64%  ksoftirqd/28     [kernel.vmlinux]           [k] net_rx_action
>> > >     0.64%  swapper          [kernel.vmlinux]           [k] intel_idle
>> > >     0.62%  ksoftirqd/28     [i40e]                     [k] i40e_napi_poll
>> > >     0.57%  xdpsock          [kernel.vmlinux]           [k] dma_direct_sync_single_for_cpu
>> > > 
>> > > Perf report for "AF_XDP direct rxdrop" with patched kernel - mitigations ON
>> > > ==========================================================================
>> > > Samples: 46K of event 'cycles', Event count (approx.): 38387018585
>> > > Overhead  Command          Shared Object             Symbol
>> > >    21.94%  ksoftirqd/28     [i40e]                    [k] i40e_clean_rx_irq_zc
>> > >    14.36%  xdpsock          xdpsock                   [.] main
>> > >    11.53%  ksoftirqd/28     [kernel.vmlinux]          [k] xsk_rcv
>> > >    11.32%  xdpsock          [i40e]                    [k] i40e_clean_rx_irq_zc
>> > >     4.02%  xdpsock          [kernel.vmlinux]          [k] xsk_rcv
>> > >     2.91%  ksoftirqd/28     [kernel.vmlinux]          [k] xdp_do_redirect
>> > >     2.45%  ksoftirqd/28     [kernel.vmlinux]          [k] xsk_umem_peek_addr
>> > >     2.19%  xdpsock          [kernel.vmlinux]          [k] xsk_umem_peek_addr
>> > >     2.08%  ksoftirqd/28     [kernel.vmlinux]          [k] bpf_direct_xsk
>> > >     2.07%  ksoftirqd/28     [kernel.vmlinux]          [k] dma_direct_sync_single_for_cpu
>> > >     1.53%  ksoftirqd/28     [kernel.vmlinux]          [k] dma_direct_sync_single_for_device
>> > >     1.39%  xdpsock          [kernel.vmlinux]          [k] dma_direct_sync_single_for_device
>> > >     1.22%  ksoftirqd/28     [kernel.vmlinux]          [k] xdp_get_xsk_from_qid
>> > >     1.12%  ksoftirqd/28     [i40e]                    [k] i40e_clean_programming_status
>> > >     0.96%  ksoftirqd/28     [i40e]                    [k] i40e_napi_poll
>> > >     0.95%  ksoftirqd/28     [kernel.vmlinux]          [k] net_rx_action
>> > >     0.89%  xdpsock          [kernel.vmlinux]          [k] xdp_do_redirect
>> > >     0.83%  swapper          [i40e]                    [k] i40e_clean_rx_irq_zc
>> > >     0.70%  swapper          [kernel.vmlinux]          [k] intel_idle
>> > >     0.66%  xdpsock          [kernel.vmlinux]          [k] dma_direct_sync_single_for_cpu
>> > >     0.60%  xdpsock          [kernel.vmlinux]          [k] bpf_direct_xsk
>> > >     0.50%  ksoftirqd/28     [kernel.vmlinux]          [k] xsk_umem_discard_addr
>> > > 
>> > > Based on the perf reports comparing AF_XDP default and direct rxdrop, we can say that
>> > > AF_XDP direct rxdrop codepath is avoiding the overhead of going through these functions
>> > > 	bpf_prog_xxx
>> > >          bpf_xdp_redirect_map
>> > > 	xsk_map_lookup_elem
>> > >          __xsk_map_redirect
>> > > With AF_XDP direct, xsk_rcv() is directly called via bpf_direct_xsk() in xdp_do_redirect()
>> > 
>> > I don't think you're identifying the overhead correctly.
>> > xsk_map_lookup_elem is 1%
>> > but bpf_xdp_redirect_map() suppose to call __xsk_map_lookup_elem()
>> > which is a different function:
>> > ffffffff81493fe0 T __xsk_map_lookup_elem
>> > ffffffff81492e80 t xsk_map_lookup_elem
>> > 
>> > 10% for bpf_prog_80b55d8a76303785 is huge.
>> > It's the actual code of the program _without_ any helpers.
>> > How does the program actually look?
>> 
>> It is the xdp program that is loaded via xsk_load_xdp_prog() in tools/lib/bpf/xsk.c
>> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/lib/bpf/xsk.c#n268
>
> I see. Looks like map_gen_lookup was never implemented for xskmap.
> How about adding it first the way array_map_gen_lookup() is implemented?
> This will easily give 2x perf gain.

I guess we should implement this for devmaps as well now that we allow
lookups into those.

However, in this particular example, the lookup from BPF is not actually
needed, since bpf_redirect_map() will return a configurable error value
when the map lookup fails (for exactly this use case).

So replacing:

if (bpf_map_lookup_elem(&xsks_map, &index))
    return bpf_redirect_map(&xsks_map, index, 0);

with simply

return bpf_redirect_map(&xsks_map, index, XDP_PASS);

would save the call to xsk_map_lookup_elem().

-Toke

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ