[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <D12839161ADD3A4B8DA63D1A134D084026E48BA027@ESGSCCMS0001.eapac.ericsson.se>
Date: Thu, 7 Apr 2011 19:15:10 +0800
From: Wei Gu <wei.gu@...csson.com>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: netdev <netdev@...r.kernel.org>,
Alexander Duyck <alexander.h.duyck@...el.com>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
Hi,
I compile the ixgbe driver into the kernel and run the test again and also change the copy to clone in the fw hook
This is the perf report while I was forwarding 150Kpps with
The attached file include the basic info about my test system. Please let me know if I did some thing wrong.
+ 71.91% swapper [kernel.kallsyms] [k] poll_idle
+ 10.43% swapper [kernel.kallsyms] [k] intel_idle
- 8.00% ksoftirqd/24 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
\u2592 - _raw_spin_unlock_irqrestore
\u2592 - 42.25% alloc_iova
\u2592 intel_alloc_iova
\u2592 __intel_map_single
\u2592 intel_map_page
\u2592 - dma_map_single_attrs.clone.3
\u2592 + 59.89% ixgbe_alloc_rx_buffers
\u2592 - 40.11% ixgbe_xmit_frame_ring
\u2592 ixgbe_xmit_frame
\u2592 dev_hard_start_xmit
\u2592 sch_direct_xmit
\u2592 dev_queue_xmit
\u2592 vlan_dev_hard_start_xmit
\u2592 hook_func
\u2592 nf_iterate
\u2592 nf_hook_slow
\u2592 NF_HOOK.clone.1
\u2592 ip_rcv
\u2592 __netif_receive_skb
\u2592 __netif_receive_skb
\u2592 netif_receive_skb
\u2592 napi_skb_finish
\u2592 napi_gro_receive
\u2592 ixgbe_clean_rx_irq
\u2592 ixgbe_clean_rxtx_many
\u2592 net_rx_action
\u2592 __do_softirq
\u2592 + call_softirq
\u2592 + 36.30% find_iova
\u2592 + 20.89% add_unmap
\u2592+ 1.60% kworker/24:1 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
\u2592+ 0.80% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
\u2592+ 0.66% snmpd [kernel.kallsyms] [k] snmp_fold_field
\u2592+ 0.53% ksoftirqd/24 [kernel.kallsyms] [k] clflush_cache_range
If I zoom out to this ksoftirqd/24
+ 80.38% ksoftirqd/24 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
+ 5.35% ksoftirqd/24 [kernel.kallsyms] [k] clflush_cache_range
+ 1.49% ksoftirqd/24 [kernel.kallsyms] [k] __domain_mapping
+ 0.84% ksoftirqd/24 [kernel.kallsyms] [k] kmem_cache_alloc
+ 0.55% ksoftirqd/24 [kernel.kallsyms] [k] _raw_spin_lock
+ 0.54% ksoftirqd/24 [kernel.kallsyms] [k] ixgbe_xmit_frame_ring
+ 0.52% ksoftirqd/24 [kernel.kallsyms] [k] ixgbe_clean_rx_irq
+ 0.50% ksoftirqd/24 [kernel.kallsyms] [k] domain_get_iommu
+ 0.49% ksoftirqd/24 [kernel.kallsyms] [k] dma_map_single_attrs.clone.3
+ 0.48% ksoftirqd/24 [kernel.kallsyms] [k] kmem_cache_free
Perf top
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PerfTop: 10615 irqs/sec kernel:99.7% exact: 0.0% [1000Hz cpu-clock-msecs], (all, 64 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _______________________________ __________________________________________________________________________
11786.00 54.9% intel_idle [kernel.kallsyms]
7180.00 33.4% _raw_spin_unlock_irqrestore [kernel.kallsyms]
469.00 2.2% clflush_cache_range [kernel.kallsyms]
138.00 0.6% __domain_mapping [kernel.kallsyms]
81.00 0.4% dso__find_symbol /root/rpmbuild/BUILD/kernel-2.6.38.el6/linux-2.6.38.x86_64/tools/perf/perf
73.00 0.3% _raw_spin_lock [kernel.kallsyms]
72.00 0.3% dso__load_sym.clone.0 /root/rpmbuild/BUILD/kernel-2.6.38.el6/linux-2.6.38.x86_64/tools/perf/perf
68.00 0.3% kmem_cache_alloc [kernel.kallsyms]
53.00 0.2% symbol_filter /root/rpmbuild/BUILD/kernel-2.6.38.el6/linux-2.6.38.x86_64/tools/perf/perf
51.00 0.2% domain_get_iommu [kernel.kallsyms]
44.00 0.2% ixgbe_clean_rx_irq [kernel.kallsyms]
42.00 0.2% kmem_cache_free [kernel.kallsyms]
42.00 0.2% ixgbe_xmit_frame_ring [kernel.kallsyms]
41.00 0.2% ixgbe_clean_tx_irq [kernel.kallsyms]
40.00 0.2% dma_map_single_attrs.clone.3 [kernel.kallsyms]
Top:
Tasks: 425 total, 2 running, 423 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 3.9%si, 0.0%st
Mem: 264733684k total, 6374016k used, 258359668k free, 43720k buffers
Swap: 4194300k total, 0k used, 4194300k free, 137308k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
79 root 20 0 0 0 0 R 38.8 0.0 29:22.85 24 ksoftirqd/24
233 root 20 0 0 0 0 S 7.6 0.0 4:06.60 24 kworker/24:1
1538 root 20 0 0 0 0 S 0.3 0.0 0:00.78 33 kworker/33:3
2271 root 20 0 200m 5564 1460 S 0.3 0.0 0:03.31 2 snmpd
Thanks
WeiGu
-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@...il.com]
Sent: Thursday, April 07, 2011 5:06 PM
To: Wei Gu
Cc: netdev; Alexander Duyck; Jeff Kirsher
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
Le jeudi 07 avril 2011 à 16:39 +0800, Wei Gu a écrit :
> I'm only insert a prerouting hook to make a copy of the incomming
> packet and swap the L2/L3 header, send it back on the same interface.
>
Small packets or big ones ?
You dont need to copy the packet, its expensive.
> BTW, some times I notices that the perf tool was not mapping the
> symbol correclly, I don't why?
>
You might try to put ixgbe in static kernel, not in a module.
> I will try a fresh install of kernel 2.6.30 and do the test with the
> shipped ixgbe driver again.
>
OK thanks.
Download attachment "2.6.38_clone_fw_tar.gz" of type "application/x-gzip" (36172 bytes)
Powered by blists - more mailing lists