linux-kernel - Re: [PATCH net-next 14/15 v2] net: Reference bpf_redirect_info via task_struct on PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e3e21c87-d210-4360-8beb-25c6a04ce581@kernel.org>
Date: Wed, 22 May 2024 09:09:45 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
 Toke Høiland-Jørgensen <toke@...hat.com>,
 LKML <linux-kernel@...r.kernel.org>,
 Network Development <netdev@...r.kernel.org>,
 "David S. Miller" <davem@...emloft.net>, Boqun Feng <boqun.feng@...il.com>,
 Daniel Borkmann <daniel@...earbox.net>, Eric Dumazet <edumazet@...gle.com>,
 Frederic Weisbecker <frederic@...nel.org>, Ingo Molnar <mingo@...hat.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
 Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>,
 Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>,
 Eduard Zingerman <eddyz87@...il.com>, Hao Luo <haoluo@...gle.com>,
 Jiri Olsa <jolsa@...nel.org>, John Fastabend <john.fastabend@...il.com>,
 KP Singh <kpsingh@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
 Song Liu <song@...nel.org>, Stanislav Fomichev <sdf@...gle.com>,
 Yonghong Song <yonghong.song@...ux.dev>, bpf <bpf@...r.kernel.org>,
 Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: Re: [PATCH net-next 14/15 v2] net: Reference bpf_redirect_info via
 task_struct on PREEMPT_RT.



On 17/05/2024 18.15, Sebastian Andrzej Siewior wrote:
> On 2024-05-14 14:20:03 [+0200], Jesper Dangaard Brouer wrote:
>> Trick for CPU-map to do early drop on remote CPU:
>>
>>   # ./xdp-bench redirect-cpu --cpu 3 --remote-action drop ixgbe1
>>
>> I recommend using Ctrl+\ while running to show more info like CPUs being
>> used and what kthread consumes.  To catch issues e.g. if you are CPU
>> redirecting to same CPU as RX happen to run on.
> 
> Okay. So I reworked the last two patches make the struct part of
> task_struct and then did as you suggested:
> 
> Unpatched:
> |Sending:
> |Show adapter(s) (eno2np1) statistics (ONLY that changed!)
> |Ethtool(eno2np1 ) stat:    952102520 (    952,102,520) <= port.tx_bytes /sec
> |Ethtool(eno2np1 ) stat:     14876602 (     14,876,602) <= port.tx_size_64 /sec
> |Ethtool(eno2np1 ) stat:     14876602 (     14,876,602) <= port.tx_unicast /sec
> |Ethtool(eno2np1 ) stat:    446045897 (    446,045,897) <= tx-0.bytes /sec
> |Ethtool(eno2np1 ) stat:      7434098 (      7,434,098) <= tx-0.packets /sec
> |Ethtool(eno2np1 ) stat:    446556042 (    446,556,042) <= tx-1.bytes /sec
> |Ethtool(eno2np1 ) stat:      7442601 (      7,442,601) <= tx-1.packets /sec
> |Ethtool(eno2np1 ) stat:    892592523 (    892,592,523) <= tx_bytes /sec
> |Ethtool(eno2np1 ) stat:     14876542 (     14,876,542) <= tx_packets /sec
> |Ethtool(eno2np1 ) stat:            2 (              2) <= tx_restart /sec
> |Ethtool(eno2np1 ) stat:            2 (              2) <= tx_stopped /sec
> |Ethtool(eno2np1 ) stat:     14876622 (     14,876,622) <= tx_unicast /sec
> |
> |Receive:
> |eth1->?                 8,732,508 rx/s                  0 err,drop/s
> |  receive total         8,732,508 pkt/s                 0 drop/s                0 error/s
> |    cpu:10              8,732,508 pkt/s                 0 drop/s                0 error/s
> |  enqueue to cpu 3      8,732,510 pkt/s                 0 drop/s             7.00 bulk-avg
> |    cpu:10->3           8,732,510 pkt/s                 0 drop/s             7.00 bulk-avg
> |  kthread total         8,732,506 pkt/s                 0 drop/s          205,650 sched
> |    cpu:3               8,732,506 pkt/s                 0 drop/s          205,650 sched
> |    xdp_stats                   0 pass/s        8,732,506 drop/s                0 redir/s
> |      cpu:3                     0 pass/s        8,732,506 drop/s                0 redir/s
> |  redirect_err                  0 error/s
> |  xdp_exception                 0 hit/s
> 
> I verified that the "drop only" case hits 14M packets/s while this
> redirect part reports 8M packets/s.
>

Great, this is a good test.

The transmit speed 14.88 Mpps is 10G wirespeed at smallest Ethernet
packet size (84 bytes with overhead + intergap, 10*10^9/(84*8) = 14880952).


> Patched:
> |Sending:
> |Show adapter(s) (eno2np1) statistics (ONLY that changed!)
> |Ethtool(eno2np1 ) stat:    952635404 (    952,635,404) <= port.tx_bytes /sec
> |Ethtool(eno2np1 ) stat:     14884934 (     14,884,934) <= port.tx_size_64 /sec
> |Ethtool(eno2np1 ) stat:     14884928 (     14,884,928) <= port.tx_unicast /sec
> |Ethtool(eno2np1 ) stat:    446496117 (    446,496,117) <= tx-0.bytes /sec
> |Ethtool(eno2np1 ) stat:      7441602 (      7,441,602) <= tx-0.packets /sec
> |Ethtool(eno2np1 ) stat:    446603461 (    446,603,461) <= tx-1.bytes /sec
> |Ethtool(eno2np1 ) stat:      7443391 (      7,443,391) <= tx-1.packets /sec
> |Ethtool(eno2np1 ) stat:    893086506 (    893,086,506) <= tx_bytes /sec
> |Ethtool(eno2np1 ) stat:     14884775 (     14,884,775) <= tx_packets /sec
> |Ethtool(eno2np1 ) stat:           14 (             14) <= tx_restart /sec
> |Ethtool(eno2np1 ) stat:           14 (             14) <= tx_stopped /sec
> |Ethtool(eno2np1 ) stat:     14884937 (     14,884,937) <= tx_unicast /sec
> |
> |Receive:
> |eth1->?                 8,735,198 rx/s                  0 err,drop/s
> |  receive total         8,735,198 pkt/s                 0 drop/s                0 error/s
> |    cpu:6               8,735,198 pkt/s                 0 drop/s                0 error/s
> |  enqueue to cpu 3      8,735,193 pkt/s                 0 drop/s             7.00 bulk-avg
> |    cpu:6->3            8,735,193 pkt/s                 0 drop/s             7.00 bulk-avg
> |  kthread total         8,735,191 pkt/s                 0 drop/s          208,054 sched
> |    cpu:3               8,735,191 pkt/s                 0 drop/s          208,054 sched
> |    xdp_stats                   0 pass/s        8,735,191 drop/s                0 redir/s
> |      cpu:3                     0 pass/s        8,735,191 drop/s                0 redir/s
> |  redirect_err                  0 error/s
> |  xdp_exception                 0 hit/s
> 

Great basically zero overhead. Awesome you verified this!


> This looks to be in the same range/ noise level. top wise I have
> ksoftirqd at 100% and cpumap/./map at ~60% so I hit CPU speed limit on a
> 10G link. 

For our purpose of testing XDP_REDIRECT code, that you are modifying,
this is what we want.  Where RX CPU/NAPI is the bottleneck, given remote
cpumap CPU have idle cycles (also indicated by the 208,054 sched stats).

> perf top shows

I appreciate getting this perf data.

As we are explicitly dealing with splitting workload across CPUs, it
worth mentioning that perf support displaying and filtering on CPUs.

This perf commands include the CPU number (zero indexed):
  # perf report --sort cpu,comm,dso,symbol --no-children

For this benchmark, to focus, I would reduce this to:
   # perf report --sort cpu,symbol --no-children

The perf tool can also use -C to filter on some CPUs like:

  # perf report --sort cpu,symbol --no-children -C 3,6


> |   18.37%  bpf_prog_4f0ffbb35139c187_cpumap_l4_hash         [k] bpf_prog_4f0ffbb35139c187_cpumap_l4_hash

This bpf_prog_4f0ffbb35139c187_cpumap_l4_hash is running on RX CPU doing 
the load-balancing.

> |   13.15%  [kernel]                                         [k] cpu_map_kthread_run

This runs on remote cpumap CPU (in this case CPU 3).

> |   12.96%  [kernel]                                         [k] ixgbe_poll
> |    6.78%  [kernel]                                         [k] page_frag_free

The page_frag_free call might run on remote cpumap CPU.

> |    5.62%  [kernel]                                         [k] xdp_do_redirect
> 
> for the top 5. Is this something that looks reasonable?

Yes, except I had to guess how the workload was split between CPUs ;-)

Thanks for doing these benchmarks! :-)
--Jesper