[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL8zT=gNtcdyyVcPt5hB6jyF1btzQArEuZgVKWkb0Wd=a4LcVA@mail.gmail.com>
Date: Tue, 12 Jun 2012 10:24:03 +0200
From: Jean-Michel Hautbois <jhautbois@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Sathya.Perla@...lex.com, netdev@...r.kernel.org
Subject: Re: Difficulties to get 1Gbps on be2net ethernet card
2012/6/8 Jean-Michel Hautbois <jhautbois@...il.com>:
> 2012/6/8 Eric Dumazet <eric.dumazet@...il.com>:
>> On Fri, 2012-06-08 at 10:14 +0200, Jean-Michel Hautbois wrote:
>>> 2012/6/8 Eric Dumazet <eric.dumazet@...il.com>:
>>> > On Thu, 2012-06-07 at 14:54 +0200, Jean-Michel Hautbois wrote:
>>> >
>>> >> eth1 Link encap:Ethernet HWaddr 68:b5:99:b9:8d:d4
>>> >> UP BROADCAST RUNNING SLAVE MULTICAST MTU:4096 Metric:1
>>> >> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>> >> TX packets:15215387 errors:0 dropped:0 overruns:0 carrier:0
>>> >> collisions:0 txqueuelen:1000
>>> >> RX bytes:0 (0.0 B) TX bytes:61476524359 (57.2 GiB)
>>> >
>>> >> qdisc mq 0: dev eth1 root
>>> >> Sent 61476524359 bytes 15215387 pkt (dropped 45683472, overlimits 0
>>> >> requeues 17480)
>>> >
>>> > OK, and "tc -s -d cl show dev eth1"
>>> >
>>> > (How many queues are really used)
>>> >
>>> >
>>> >
>>>
>>> tc -s -d cl show dev eth1
>>> class mq :1 root
>>> Sent 9798071746 bytes 2425410 pkt (dropped 3442405, overlimits 0 requeues 2747)
>>> backlog 0b 0p requeues 2747
>>> class mq :2 root
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> class mq :3 root
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> class mq :4 root
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> class mq :5 root
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> class mq :6 root
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> class mq :7 root
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> class mq :8 root
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>
>>
>> Do you have the same distribution on old kernels as well ?
>> (ie only queue 0 is used)
>>
>>
>>
>
> On the old kernel, there is nothing returned by this command.
>
> JM
I used perf in order to get more information.
Here is the perf record -a sleep 10 result (I took only kernel) :
6.93% ModuleTester [kernel.kallsyms] [k]
copy_user_generic_string
2.99% swapper [kernel.kallsyms] [k] mwait_idle
2.60% kipmi0 [ipmi_si] [k] port_inb
1.75% swapper [kernel.kallsyms] [k] rb_prev
1.63% ModuleTester [kernel.kallsyms] [k] _raw_spin_lock
1.43% NodeManager [kernel.kallsyms] [k] delay_tsc
0.90% ModuleTester [kernel.kallsyms] [k] clear_page_c
0.88% ModuleTester [kernel.kallsyms] [k] dev_queue_xmit
0.80% ip [kernel.kallsyms] [k]
snmp_fold_field
0.73% ModuleTester [kernel.kallsyms] [k]
clflush_cache_range
0.69% grep [kernel.kallsyms] [k] page_fault
0.61% sh [kernel.kallsyms] [k] page_fault
0.59% ModuleTester [kernel.kallsyms] [k] udp_sendmsg
0.55% ModuleTester [kernel.kallsyms] [k] _raw_read_lock
0.53% sh [kernel.kallsyms] [k] unmap_vmas
0.52% ModuleTester [kernel.kallsyms] [k] rb_prev
0.51% ModuleTester [kernel.kallsyms] [k]
find_busiest_group
0.49% ModuleTester [kernel.kallsyms] [k] __ip_make_skb
0.48% ModuleTester [kernel.kallsyms] [k]
sock_alloc_send_pskb
0.48% ModuleTester libpthread-2.7.so [.]
pthread_mutex_lock
0.47% ModuleTester [kernel.kallsyms] [k]
__netif_receive_skb
0.44% ip [kernel.kallsyms] [k] find_next_bit
0.43% swapper [kernel.kallsyms] [k]
clflush_cache_range
0.41% ps [kernel.kallsyms] [k] format_decode
0.41% ModuleTester [bonding] [k]
bond_start_xmit
0.39% ModuleTester [be2net] [k] be_xmit
0.39% ModuleTester [kernel.kallsyms] [k]
__ip_append_data
0.38% ModuleTester [kernel.kallsyms] [k] netif_rx
0.37% swapper [be2net] [k] be_poll
0.37% swapper [kernel.kallsyms] [k] ktime_get
0.37% sh [kernel.kallsyms] [k] copy_page_c
0.36% swapper [kernel.kallsyms] [k]
irq_entries_start
0.36% ModuleTester [kernel.kallsyms] [k]
__alloc_pages_nodemask
0.35% ModuleTester [kernel.kallsyms] [k] __slab_free
0.35% ModuleTester [kernel.kallsyms] [k] ip_mc_output
0.34% ModuleTester [kernel.kallsyms] [k]
skb_release_data
0.33% ip [kernel.kallsyms] [k] page_fault
0.33% ModuleTester [kernel.kallsyms] [k] udp_send_skb
And here is the perf record -a result without bonding :
2.49% ModuleTester [kernel.kallsyms] [k]
csum_partial_copy_generic
1.35% ModuleTester [kernel.kallsyms] [k] _raw_spin_lock
1.29% ModuleTester [kernel.kallsyms] [k]
clflush_cache_range
1.16% jobprocess [kernel.kallsyms] [k] rb_prev
1.01% jobprocess [kernel.kallsyms] [k]
clflush_cache_range
0.81% ModuleTester [be2net] [k] be_xmit
0.78% jobprocess [kernel.kallsyms] [k] __slab_free
0.77% swapper [kernel.kallsyms] [k] mwait_idle
0.72% ModuleTester [kernel.kallsyms] [k]
__domain_mapping
0.66% jobprocess [kernel.kallsyms] [k] _raw_spin_lock
0.59% jobprocess [kernel.kallsyms] [k]
_raw_spin_lock_irqsave
0.56% ModuleTester [kernel.kallsyms] [k] rb_prev
0.53% swapper [kernel.kallsyms] [k] rb_prev
0.49% ModuleTester [kernel.kallsyms] [k] sock_wmalloc
0.47% jobprocess [be2net] [k] be_poll
0.47% ModuleTester [kernel.kallsyms] [k]
kmem_cache_alloc
0.47% swapper [kernel.kallsyms] [k]
clflush_cache_range
0.45% kipmi0 [ipmi_si] [k] port_inb
0.42% swapper [kernel.kallsyms] [k] __slab_free
0.41% jobprocess [kernel.kallsyms] [k] try_to_wake_up
0.40% ModuleTester [kernel.kallsyms] [k]
kmem_cache_alloc_node
0.40% jobprocess [kernel.kallsyms] [k] tg_load_down
0.39% jobprocess libodyssey.so.1.8.2 [.]
y8_deblocking_luma_vert_edge_h264_sse2
0.38% jobprocess libodyssey.so.1.8.2 [.]
y8_deblocking_luma_horz_edge_h264_ssse3
0.38% ModuleTester [kernel.kallsyms] [k] rb_insert_color
0.37% jobprocess [kernel.kallsyms] [k] find_iova
0.37% jobprocess [kernel.kallsyms] [k]
find_busiest_group
0.36% jobprocess libpthread-2.7.so [.]
pthread_mutex_lock
0.35% swapper [kernel.kallsyms] [k]
_raw_spin_unlock_irqrestore
0.34% ModuleTester [kernel.kallsyms] [k]
_raw_spin_lock_irqsave
0.33% ModuleTester [kernel.kallsyms] [k]
pfifo_fast_dequeue
0.32% ModuleTester [kernel.kallsyms] [k]
__kmalloc_node_track_caller
0.32% jobprocess [be2net] [k]
be_tx_compl_process
0.31% ModuleTester [kernel.kallsyms] [k] ip_fragment
0.29% swapper [kernel.kallsyms] [k]
__hrtimer_start_range_ns
0.29% jobprocess [kernel.kallsyms] [k] __schedule
0.29% ModuleTester [kernel.kallsyms] [k] dev_queue_xmit
0.28% swapper [kernel.kallsyms] [k] __schedule
First thing I notice is the difference in copy_user_generic_string (it
is only 0.11% on the second measure, I didn't reported it here).
I think perf can help in finding the issue I observe with bonding,
maybe do you have suggestions on the parameters to use ?
FYI, with bonding, TX goes up to 640Mbps, without bonding, I can send
2.4Gbps without suffering...
JM
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists