[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49EA2D7F.3080405@cosmosbay.com>
Date: Sat, 18 Apr 2009 21:43:59 +0200
From: Eric Dumazet <dada1@...mosbay.com>
To: Christoph Lameter <cl@...ux.com>
CC: David Miller <davem@...emloft.net>,
Michael Chan <mchan@...adcom.com>,
Ben Hutchings <bhutchings@...arflare.com>,
netdev@...r.kernel.org
Subject: Re: Network latency regressions from 2.6.22 to 2.6.29 (results with
IRQ affinity)
Christoph Lameter a écrit :
> On Thu, 16 Apr 2009, David Miller wrote:
>
>> We really need to find a good way to fix that IRQ affinity issue.
Yep... this is annoying, but disable_msi=1 can help
>
> Here are the results with setting interrupt affinity. We are still loosing
> 5 usecs from .22 to .30-rc2. We will investigate more in depth time
> permitting.
>
> UDP ping pong 40 bytes
>
> Kernel Test 1 Test 2 Test 3 Test 4 Avg
> 2.6.22 83.1125 83.4 83.03 82.95 83.07
> 2.6.23 83.33 83.51 83.18 83.42 83.21
> 2.6.24 82.7775 82.8 82.77 82.77 82.77
> 2.6.25 85.875 85.94 85.75 86.06 85.75
> 2.6.26 87.4025 87.53 87.25 87.3 87.53
> 2.6.27 87.81 87.81 87.77 87.82 87.84
> 2.6.28 87.4275 87.77 87.24 87.28 87.42
> 2.6.29 88.4325 88.48 88.42 88.44 88.39
> 2.6.30-rc2 88.4925 88.49 88.44 88.56 88.48
I did my tests with 3 machines, trying to reproduce your results and failed.
So its good news for us (we can go back watching TV :) )
192.168.20.110 : linux-2.6 latest git tree, receiver "newkernel"
192.168.20.120 : linux-2.6.22, receiver "oldkernel"
192.168.20.112 : 2.6.28.7 as the sender machine (we dont really care)
I only took traces on receivers to rule out different coalescing delays
of NIC that could depend on firmware or version
bnx2 driver statically included in vmlinux.
(on newkernel, its "disable_msi" set to 1 (because this is the only way my BCM5708S can
have interrupt affinities set by the admin used by the hardware on 2.6.28+)
Result for both new and old kernel :
3 us of latency to answer to a ICMP ping request, if irq affinity correct.
This 3 us includes the overhead of timestamping and pcap copy of packets
since I have my tcpdump launched.
In my case I want NIC IRQs on CPU0 : "echo 1 >/proc/irq/16/smp_affinity"
(trace taken on the receiver side)
ICMP Ping for 2.6.30-rc2 : 3us, 3us, 3us, 3us
15:11:52.856072 IP 192.168.20.112 > 192.168.20.110: ICMP echo request, id 7957, seq 3330, length 64
15:11:52.856075 IP 192.168.20.110 > 192.168.20.112: ICMP echo reply, id 7957, seq 3330, length 64
15:11:52.856168 IP 192.168.20.112 > 192.168.20.110: ICMP echo request, id 7957, seq 3331, length 64
15:11:52.856171 IP 192.168.20.110 > 192.168.20.112: ICMP echo reply, id 7957, seq 3331, length 64
15:11:52.856264 IP 192.168.20.112 > 192.168.20.110: ICMP echo request, id 7957, seq 3332, length 64
15:11:52.856267 IP 192.168.20.110 > 192.168.20.112: ICMP echo reply, id 7957, seq 3332, length 64
15:11:52.856359 IP 192.168.20.112 > 192.168.20.110: ICMP echo request, id 7957, seq 3333, length 64
15:11:52.856362 IP 192.168.20.110 > 192.168.20.112: ICMP echo reply, id 7957, seq 3333, length 64
ICMP Ping for 2.6.22: 3us, 3us, 3us, 3us : Exactly same latencies than current kernel
20:35:06.375011 IP 192.168.20.112 > 192.168.20.120: icmp 64: echo request seq 110
20:35:06.375014 IP 192.168.20.120 > 192.168.20.112: icmp 64: echo reply seq 110
20:35:06.375106 IP 192.168.20.112 > 192.168.20.120: icmp 64: echo request seq 111
20:35:06.375109 IP 192.168.20.120 > 192.168.20.112: icmp 64: echo reply seq 111
20:35:06.375201 IP 192.168.20.112 > 192.168.20.120: icmp 64: echo request seq 112
20:35:06.375204 IP 192.168.20.120 > 192.168.20.112: icmp 64: echo reply seq 112
20:35:06.375296 IP 192.168.20.112 > 192.168.20.120: icmp 64: echo request seq 113
20:35:06.375299 IP 192.168.20.120 > 192.168.20.112: icmp 64: echo reply seq 113
Answering to a ping involves the regular network stack, with softirq and all...
But we know for sure that the reply comes from the cpu that handled the rx, so
no false sharing between cpus, and that no user land process was involved, only softirqs.
Now with your udpping I get very variables times : 21 us, 14 us, 27 us, 11 us on following random samples
(trace taken on receiver side again)
15:18:50.256238 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
15:18:50.256259 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
15:18:50.256371 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
15:18:50.256385 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
15:18:50.256498 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
15:18:50.256525 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
15:18:50.256638 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
15:18:50.256649 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
Then if I force uddpping receiver running on CPU2 (same physical CPU receiving NIC interrupts,
but a different core) :
I get : 13us, 11us, 10us, 10us, which is not bad.
21:07:48.823741 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:07:48.823754 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:07:48.823852 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:07:48.823863 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:07:48.823962 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:07:48.823972 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:07:48.824087 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:07:48.824097 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
If run on another physical cpu : I get 15us, 12us, 12us, 12us
21:09:10.667120 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:09:10.667135 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:09:10.667234 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:09:10.667246 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:09:10.667345 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:09:10.667357 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:09:10.667471 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:09:10.667483 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
If I force udpping to same cpu than IRQS (CPU0) I get : 7us, 8us, 7us, 7us
And yes, this is very good and expected IMHO :)
21:04:23.780421 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:04:23.780428 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:04:23.780539 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:04:23.780547 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:04:23.780663 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:04:23.780670 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
21:04:23.780783 IP 192.168.20.112.9001 > 192.168.20.110.9000: UDP, length 300
21:04:23.780790 IP 192.168.20.110.9000 > 192.168.20.112.9001: UDP, length 300
Now running the receiver on 2.6.22
udpping: 6us, 15us, 14us, 13us
16:34:34.266069 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
16:34:34.266075 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300
16:34:34.266093 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
16:34:34.266108 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300
16:34:34.266209 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
16:34:34.266223 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300
16:34:34.266323 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
16:34:34.266335 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300
Same cpu for user process and IRQS :
udpping : 7us, 9us, 8us, 10us : Very good, as we could expect !
20:39:07.106698 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
20:39:07.106705 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300
20:39:07.106808 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
20:39:07.106819 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300
20:39:07.106918 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
20:39:07.106926 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300
20:39:07.107046 IP 192.168.20.112.9001 > 192.168.20.120.9000: UDP, length 300
20:39:07.107056 IP 192.168.20.120.9000 > 192.168.20.112.9001: UDP, length 300
But in the overall, I get litle bit better results with 2.6.30-rc2 : 7 us
I did an oprofile session on 2.6.30-rc2 for the 'best case' (7us of latency)
tracing only kernel events.
(and unfortunalty IRQ16 is shared by eth0, eth1 and uhci_hcd:usb1
CPU: Core 2, speed 3000.19 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 10000
samples cum. samples % cum. % symbol name
252423 252423 19.4717 19.4717 mwait_idle
99195 351618 7.6518 27.1235 apic_timer_interrupt
64108 415726 4.9452 32.0687 bnx2_interrupt
62195 477921 4.7977 36.8664 find_busiest_group
60022 537943 4.6300 41.4964 scheduler_tick
40772 578715 3.1451 44.6415 read_tsc
38550 617265 2.9737 47.6152 uhci_irq
36074 653339 2.7827 50.3980 sched_clock_cpu
25319 678658 1.9531 52.3510 rb_get_reader_page
25114 703772 1.9373 54.2883 native_sched_clock
23753 727525 1.8323 56.1206 getnstimeofday
22103 749628 1.7050 57.8256 ktime_get
19924 769552 1.5369 59.3625 hrtimer_interrupt
18999 788551 1.4656 60.8281 ktime_get_ts
17910 806461 1.3816 62.2096 tick_check_oneshot_broadcast
17664 824125 1.3626 63.5722 ring_buffer_consume
13354 837479 1.0301 64.6023 irq_entries_start
12885 850364 0.9939 65.5963 sched_clock_tick
12270 862634 0.9465 66.5428 tick_sched_timer
11232 873866 0.8664 67.4092 __schedule
10607 884473 0.8182 68.2274 softlockup_tick
9446 893919 0.7287 68.9561 raise_softirq
8483 902402 0.6544 69.6104 run_timer_softirq
8295 910697 0.6399 70.2503 __do_softirq
8208 918905 0.6332 70.8835 copy_to_user
7929 926834 0.6116 71.4951 run_rebalance_domains
7478 934312 0.5768 72.0720 bnx2_poll_work
6872 941184 0.5301 72.6021 cpu_idle
6267 947451 0.4834 73.0855 rcu_pending
5953 953404 0.4592 73.5447 rb_advance_reader
5781 959185 0.4459 73.9906 handle_fasteoi_irq
5762 964947 0.4445 74.4351 cpu_clock
5615 970562 0.4331 74.8682 udp_sendmsg
5544 976106 0.4277 75.2959 __skb_recv_datagram
5493 981599 0.4237 75.7196 set_normalized_timespec
5437 987036 0.4194 76.1390 update_wall_time
5282 992318 0.4074 76.5465 smp_apic_timer_interrupt
5097 997415 0.3932 76.9397 ip_append_data
5034 1002449 0.3883 77.3280 find_next_bit
my vmstat 1 on 2.6.30-rC2 receiver during test :
0 0 0 4047312 12944 40156 0 0 0 0 64231 26101 0 1 99 0
0 1 0 4047312 12948 40156 0 0 0 32 64297 25762 0 1 99 0
0 0 0 4047304 12952 40156 0 0 0 8 64650 25510 0 1 99 0
0 0 0 4047312 12960 40152 0 0 0 344 64533 25434 0 1 98 1
0 0 0 4047312 12960 40156 0 0 0 0 64279 25628 0 1 99 0
0 0 0 4047312 12960 40156 0 0 0 4 64606 25356 0 1 99 0
0 0 0 4047312 12960 40156 0 0 0 0 64253 25732 0 1 99 0
0 1 0 4047304 12968 40152 0 0 0 40 64735 25211 0 1 99 0
0 0 0 4047312 12968 40156 0 0 0 356 64638 25443 0 1 99 0
0 0 0 4047312 12968 40156 0 0 0 0 64269 25608 0 1 99 0
0 0 0 4047312 12968 40156 0 0 0 0 64562 25120 0 1 99 0
1 0 0 4047312 12968 40156 0 0 0 4 64668 24811 0 1 99 0
My three machines have same hardware :
HP ProLiant BL460c G1
2 quad core Intel(R) Xeon(R) CPU E5450 @ 3.00GHz FSB 1333MHz
4GB ram (667 MHz)
Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
32 bit kernel
In conclusion, something might be wrong in your methodology, or your hardware, or...
Please Christoph, would you mind posting more information on your config ?
Any chance you have CONFIG_PREEMT enabled or any expensive kernel debug thing ?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists