lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <D12839161ADD3A4B8DA63D1A134D084026E48BA027@ESGSCCMS0001.eapac.ericsson.se>
Date:	Thu, 7 Apr 2011 19:15:10 +0800
From:	Wei Gu <wei.gu@...csson.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	netdev <netdev@...r.kernel.org>,
	Alexander Duyck <alexander.h.duyck@...el.com>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Hi,
I compile the ixgbe driver into the kernel and run the test again and also change the copy to clone in the fw hook
This is the perf report while I was forwarding 150Kpps with
The attached file include the basic info about my test system. Please let me know if I did some thing wrong.

+     71.91%          swapper  [kernel.kallsyms]            [k] poll_idle
+     10.43%          swapper  [kernel.kallsyms]            [k] intel_idle
-      8.00%     ksoftirqd/24  [kernel.kallsyms]            [k] _raw_spin_unlock_irqrestore
\u2592   - _raw_spin_unlock_irqrestore
\u2592      - 42.25% alloc_iova
\u2592           intel_alloc_iova
\u2592           __intel_map_single
\u2592           intel_map_page
\u2592         - dma_map_single_attrs.clone.3
\u2592            + 59.89% ixgbe_alloc_rx_buffers
\u2592            - 40.11% ixgbe_xmit_frame_ring
\u2592                 ixgbe_xmit_frame
\u2592                 dev_hard_start_xmit
\u2592                 sch_direct_xmit
\u2592                 dev_queue_xmit
\u2592                 vlan_dev_hard_start_xmit
\u2592                 hook_func
\u2592                 nf_iterate
\u2592                 nf_hook_slow
\u2592                 NF_HOOK.clone.1
\u2592                 ip_rcv
\u2592                 __netif_receive_skb
\u2592                 __netif_receive_skb
\u2592                 netif_receive_skb
\u2592                 napi_skb_finish
\u2592                 napi_gro_receive
\u2592                 ixgbe_clean_rx_irq
\u2592                 ixgbe_clean_rxtx_many
\u2592                 net_rx_action
\u2592                 __do_softirq
\u2592               + call_softirq
\u2592      + 36.30% find_iova
\u2592      + 20.89% add_unmap
\u2592+      1.60%     kworker/24:1  [kernel.kallsyms]            [k] _raw_spin_unlock_irqrestore
\u2592+      0.80%          swapper  [kernel.kallsyms]            [k] _raw_spin_unlock_irqrestore
\u2592+      0.66%            snmpd  [kernel.kallsyms]            [k] snmp_fold_field
\u2592+      0.53%     ksoftirqd/24  [kernel.kallsyms]            [k] clflush_cache_range


If I zoom out to this ksoftirqd/24
+     80.38%  ksoftirqd/24  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
+      5.35%  ksoftirqd/24  [kernel.kallsyms]  [k] clflush_cache_range
+      1.49%  ksoftirqd/24  [kernel.kallsyms]  [k] __domain_mapping
+      0.84%  ksoftirqd/24  [kernel.kallsyms]  [k] kmem_cache_alloc
+      0.55%  ksoftirqd/24  [kernel.kallsyms]  [k] _raw_spin_lock
+      0.54%  ksoftirqd/24  [kernel.kallsyms]  [k] ixgbe_xmit_frame_ring
+      0.52%  ksoftirqd/24  [kernel.kallsyms]  [k] ixgbe_clean_rx_irq
+      0.50%  ksoftirqd/24  [kernel.kallsyms]  [k] domain_get_iommu
+      0.49%  ksoftirqd/24  [kernel.kallsyms]  [k] dma_map_single_attrs.clone.3
+      0.48%  ksoftirqd/24  [kernel.kallsyms]  [k] kmem_cache_free

Perf top

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   PerfTop:   10615 irqs/sec  kernel:99.7%  exact:  0.0% [1000Hz cpu-clock-msecs],  (all, 64 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                        DSO
             _______ _____ _______________________________ __________________________________________________________________________

            11786.00 54.9% intel_idle                      [kernel.kallsyms]
             7180.00 33.4% _raw_spin_unlock_irqrestore     [kernel.kallsyms]
              469.00  2.2% clflush_cache_range             [kernel.kallsyms]
              138.00  0.6% __domain_mapping                [kernel.kallsyms]
               81.00  0.4% dso__find_symbol                /root/rpmbuild/BUILD/kernel-2.6.38.el6/linux-2.6.38.x86_64/tools/perf/perf
               73.00  0.3% _raw_spin_lock                  [kernel.kallsyms]
               72.00  0.3% dso__load_sym.clone.0           /root/rpmbuild/BUILD/kernel-2.6.38.el6/linux-2.6.38.x86_64/tools/perf/perf
               68.00  0.3% kmem_cache_alloc                [kernel.kallsyms]
               53.00  0.2% symbol_filter                   /root/rpmbuild/BUILD/kernel-2.6.38.el6/linux-2.6.38.x86_64/tools/perf/perf
               51.00  0.2% domain_get_iommu                [kernel.kallsyms]
               44.00  0.2% ixgbe_clean_rx_irq              [kernel.kallsyms]
               42.00  0.2% kmem_cache_free                 [kernel.kallsyms]
               42.00  0.2% ixgbe_xmit_frame_ring           [kernel.kallsyms]
               41.00  0.2% ixgbe_clean_tx_irq              [kernel.kallsyms]
               40.00  0.2% dma_map_single_attrs.clone.3    [kernel.kallsyms]


Top:

Tasks: 425 total,   2 running, 423 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 96.0%id,  0.0%wa,  0.0%hi,  3.9%si,  0.0%st
Mem:  264733684k total,  6374016k used, 258359668k free,    43720k buffers
Swap:  4194300k total,        0k used,  4194300k free,   137308k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+   P COMMAND
   79 root      20   0     0    0    0 R 38.8  0.0  29:22.85 24 ksoftirqd/24
  233 root      20   0     0    0    0 S  7.6  0.0   4:06.60 24 kworker/24:1
 1538 root      20   0     0    0    0 S  0.3  0.0   0:00.78 33 kworker/33:3
 2271 root      20   0  200m 5564 1460 S  0.3  0.0   0:03.31  2 snmpd


Thanks
WeiGu

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@...il.com]
Sent: Thursday, April 07, 2011 5:06 PM
To: Wei Gu
Cc: netdev; Alexander Duyck; Jeff Kirsher
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le jeudi 07 avril 2011 à 16:39 +0800, Wei Gu a écrit :
> I'm only insert a prerouting hook to make a copy of the incomming
> packet and swap the L2/L3 header, send it back on the same interface.
>

Small packets or big ones ?

You dont need to copy the packet, its expensive.


> BTW, some times I notices that the perf tool was not mapping the
> symbol correclly, I don't why?
>

You might try to put ixgbe in static kernel, not in a module.

> I will try a fresh install of kernel 2.6.30 and do the test with the
> shipped ixgbe driver again.
>

OK thanks.






Download attachment "2.6.38_clone_fw_tar.gz" of type "application/x-gzip" (36172 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ