[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240220153206.AUZ_zP24@linutronix.de>
Date: Tue, 20 Feb 2024 16:32:06 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>,
bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH RFC net-next 1/2] net: Reference bpf_redirect_info via
task_struct on PREEMPT_RT.
On 2024-02-20 13:57:24 [+0100], Jesper Dangaard Brouer wrote:
> > so I replaced nr_cpu_ids with 64 and bootet maxcpus=64 so that I can run
> > xdp-bench on the ixgbe.
> >
>
> Yes, ixgbe HW have limited TX queues, and XDP tries to allocate a
> hardware TX queue for every CPU in the system. So, I guess you have too
> many CPUs in your system - lol.
>
> Other drivers have a fallback to a locked XDP TX path, so this is also
> something to lookout for in the machine with i40e.
this locked XDP TX path starts at 64 but xdp_progs are rejected > 64 * 2.
> > so. i40 send, ixgbe receive.
> >
> > -t 2
> >
> > | Summary 2,348,800 rx/s 0 err/s
> > | receive total 2,348,800 pkt/s 2,348,800 drop/s 0 error/s
> > | cpu:0 2,348,800 pkt/s 2,348,800 drop/s 0 error/s
> > | xdp_exception 0 hit/s
> >
>
> This is way too low, with i40e sending.
>
> On my system with only -t 1 my i40e driver can send with approx 15Mpps:
>
> Ethtool(i40e2) stat: 15028585 ( 15,028,585) <= tx-0.packets /sec
> Ethtool(i40e2) stat: 15028589 ( 15,028,589) <= tx_packets /sec
-t1 in ixgbe
Show adapter(s) (eth1) statistics (ONLY that changed!)
Ethtool(eth1 ) stat: 107857263 ( 107,857,263) <= tx_bytes /sec
Ethtool(eth1 ) stat: 115047684 ( 115,047,684) <= tx_bytes_nic /sec
Ethtool(eth1 ) stat: 1797621 ( 1,797,621) <= tx_packets /sec
Ethtool(eth1 ) stat: 1797636 ( 1,797,636) <= tx_pkts_nic /sec
Ethtool(eth1 ) stat: 107857263 ( 107,857,263) <= tx_queue_0_bytes /sec
Ethtool(eth1 ) stat: 1797621 ( 1,797,621) <= tx_queue_0_packets /sec
-t i40e
Ethtool(eno2np1 ) stat: 90 ( 90) <= port.rx_bytes /sec
Ethtool(eno2np1 ) stat: 1 ( 1) <= port.rx_size_127 /sec
Ethtool(eno2np1 ) stat: 1 ( 1) <= port.rx_unicast /sec
Ethtool(eno2np1 ) stat: 79554379 ( 79,554,379) <= port.tx_bytes /sec
Ethtool(eno2np1 ) stat: 1243037 ( 1,243,037) <= port.tx_size_64 /sec
Ethtool(eno2np1 ) stat: 1243037 ( 1,243,037) <= port.tx_unicast /sec
Ethtool(eno2np1 ) stat: 86 ( 86) <= rx-32.bytes /sec
Ethtool(eno2np1 ) stat: 1 ( 1) <= rx-32.packets /sec
Ethtool(eno2np1 ) stat: 86 ( 86) <= rx_bytes /sec
Ethtool(eno2np1 ) stat: 1 ( 1) <= rx_cache_waive /sec
Ethtool(eno2np1 ) stat: 1 ( 1) <= rx_packets /sec
Ethtool(eno2np1 ) stat: 1 ( 1) <= rx_unicast /sec
Ethtool(eno2np1 ) stat: 74580821 ( 74,580,821) <= tx-0.bytes /sec
Ethtool(eno2np1 ) stat: 1243014 ( 1,243,014) <= tx-0.packets /sec
Ethtool(eno2np1 ) stat: 74580821 ( 74,580,821) <= tx_bytes /sec
Ethtool(eno2np1 ) stat: 1243014 ( 1,243,014) <= tx_packets /sec
Ethtool(eno2np1 ) stat: 1243037 ( 1,243,037) <= tx_unicast /sec
mine is slightly slower. But this seems to match what I see on the RX
side.
> At this level, if you can verify that CPU:60 is 100% loaded, and packet
> generator is sending more than rx number, then it could work as a valid
> experiment.
i40e receiving on 8:
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni, 84.8 id, 0.0 wa, 0.0 hi, 15.2 si, 0.0 st
ixgbe receiving on 13:
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni, 56.7 id, 0.0 wa, 0.0 hi, 43.3 si, 0.0 st
looks idle. On the sending side kpktgend_0 is always at 100%.
> > -t 18
> > | Summary 7,784,946 rx/s 0 err/s
> > | receive total 7,784,946 pkt/s 7,784,946 drop/s 0 error/s
> > | cpu:60 7,784,946 pkt/s 7,784,946 drop/s 0 error/s
> > | xdp_exception 0 hit/s
> >
> > after t18 it drop down to 2,…
> > Now I got worse than before since -t8 says 7,5… and it did 8,4 in the
> > morning. Do you have maybe a .config for me in case I did not enable the
> > performance switch?
> >
>
> I would look for root-cause with perf record +
> perf report --sort cpu,comm,dso,symbol --no-children
while sending with ixgbe while running perf top on the box:
| Samples: 621K of event 'cycles', 4000 Hz, Event count (approx.): 49979376685 lost: 0/0 drop: 0/0
| Overhead CPU Command Shared Object Symbol
| 31.98% 000 kpktgend_0 [kernel] [k] xas_find
| 6.72% 000 kpktgend_0 [kernel] [k] pfn_to_dma_pte
| 5.63% 000 kpktgend_0 [kernel] [k] ixgbe_xmit_frame_ring
| 4.78% 000 kpktgend_0 [kernel] [k] dma_pte_clear_level
| 3.16% 000 kpktgend_0 [kernel] [k] __iommu_dma_unmap
| 2.30% 000 kpktgend_0 [kernel] [k] fq_ring_free_locked
| 1.99% 000 kpktgend_0 [kernel] [k] __domain_mapping
| 1.82% 000 kpktgend_0 [kernel] [k] iommu_dma_alloc_iova
| 1.80% 000 kpktgend_0 [kernel] [k] __iommu_map
| 1.72% 000 kpktgend_0 [kernel] [k] iommu_pgsize.isra.0
| 1.70% 000 kpktgend_0 [kernel] [k] __iommu_dma_map
| 1.63% 000 kpktgend_0 [kernel] [k] alloc_iova_fast
| 1.59% 000 kpktgend_0 [kernel] [k] _raw_spin_lock_irqsave
| 1.32% 000 kpktgend_0 [kernel] [k] iommu_map
| 1.30% 000 kpktgend_0 [kernel] [k] iommu_dma_map_page
| 1.23% 000 kpktgend_0 [kernel] [k] intel_iommu_iotlb_sync_map
| 1.21% 000 kpktgend_0 [kernel] [k] xa_find_after
| 1.17% 000 kpktgend_0 [kernel] [k] ixgbe_poll
| 1.06% 000 kpktgend_0 [kernel] [k] __iommu_unmap
| 1.04% 000 kpktgend_0 [kernel] [k] intel_iommu_unmap_pages
| 1.01% 000 kpktgend_0 [kernel] [k] free_iova_fast
| 0.96% 000 kpktgend_0 [pktgen] [k] pktgen_thread_worker
the i40e box while sending:
|Samples: 400K of event 'cycles:P', 4000 Hz, Event count (approx.): 80512443924 lost: 0/0 drop: 0/0
|Overhead CPU Command Shared Object Symbol
| 24.04% 000 kpktgend_0 [kernel] [k] i40e_lan_xmit_frame
| 17.20% 019 swapper [kernel] [k] i40e_napi_poll
| 4.84% 019 swapper [kernel] [k] intel_idle_irq
| 4.20% 019 swapper [kernel] [k] napi_consume_skb
| 3.00% 000 kpktgend_0 [pktgen] [k] pktgen_thread_worker
| 2.76% 008 swapper [kernel] [k] i40e_napi_poll
| 2.36% 000 kpktgend_0 [kernel] [k] dma_map_page_attrs
| 1.93% 019 swapper [kernel] [k] dma_unmap_page_attrs
| 1.70% 008 swapper [kernel] [k] intel_idle_irq
| 1.44% 008 swapper [kernel] [k] __udp4_lib_rcv
| 1.44% 008 swapper [kernel] [k] __netif_receive_skb_core.constprop.0
| 1.40% 008 swapper [kernel] [k] napi_build_skb
| 1.28% 000 kpktgend_0 [kernel] [k] kfree_skb_reason
| 1.27% 008 swapper [kernel] [k] ip_rcv_core
| 1.19% 008 swapper [kernel] [k] inet_gro_receive
| 1.01% 008 swapper [kernel] [k] kmem_cache_free.part.0
> --Jesper
Sebastian
Powered by blists - more mailing lists