[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1274388439.2508.27.camel@edumazet-laptop>
Date: Thu, 20 May 2010 22:47:19 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: netdev <netdev@...r.kernel.org>, David Miller <davem@...emloft.net>
Cc: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Subject: [RFC] tcp: delack_timer expiration changes for every frame
While oprofiling net-next-2.6 during tcp workloads I found
mod_timer(delack_timer) was used too much, even if we receive/send more
than one frame per jiffie.
Something seems wrong, since we should try to change this timer only
when jiffies changes. mod_timer() has a special optimization for this,
but something is broken in our tcp stack ?
I added some logs in mod_timer() :
HZ = 250
results for one socket shown :
[ 392.116735] timer->expires=22997, expires=23024(37) diff=-27 timer=e5ecb754
[ 392.120627] timer->expires=23024, expires=22998(10) diff=26 timer=e5ecb754
[ 392.123245] timer->expires=22998, expires=23025(37) diff=-27 timer=e5ecb754
[ 392.133688] timer->expires=23025, expires=23001(10) diff=24 timer=e5ecb754
[ 392.136502] timer->expires=23001, expires=23029(37) diff=-28 timer=e5ecb754
[ 392.140392] timer->expires=23029, expires=23003(10) diff=26 timer=e5ecb754
[ 392.143142] timer->expires=23003, expires=23030(37) diff=-27 timer=e5ecb754
[ 392.153812] timer->expires=23030, expires=23006(10) diff=24 timer=e5ecb754
[ 392.156658] timer->expires=23006, expires=23034(37) diff=-28 timer=e5ecb754
[ 392.160474] timer->expires=23034, expires=23008(10) diff=26 timer=e5ecb754
[ 392.163317] timer->expires=23008, expires=23035(37) diff=-27 timer=e5ecb754
[ 392.167176] timer->expires=23035, expires=23009(10) diff=26 timer=e5ecb754
[ 392.176963] timer->expires=23009, expires=23039(37) diff=-30 timer=e5ecb754
[ 392.180863] timer->expires=23039, expires=23013(10) diff=26 timer=e5ecb754
[ 392.183577] timer->expires=23013, expires=23040(37) diff=-27 timer=e5ecb754
[ 392.187537] timer->expires=23040, expires=23014(10) diff=26 timer=e5ecb754
[ 392.197286] timer->expires=23014, expires=23044(37) diff=-30 timer=e5ecb754
[ 392.201047] timer->expires=23044, expires=23018(10) diff=26 timer=e5ecb754
[ 392.203761] timer->expires=23018, expires=23045(37) diff=-27 timer=e5ecb754
[ 392.207721] timer->expires=23045, expires=23019(10) diff=26 timer=e5ecb754
[ 392.217454] timer->expires=23019, expires=23049(37) diff=-30 timer=e5ecb754
So we change the delack_timer by a positive delta (~ HZ/10) and a
negative delta (~HZ/10), on the typical netperf TCP_RR workload.
Here, the incoming frame is handled by netperf, doing a recvmsg().
tcp_send_delayed_ack() sets the delack_timer to jiffies + HZ/25
[ 392.207721] timer->expires=23045, new expires=23019(10) diff=26 timer=e5ecb754
[ 392.207785] ------------[ cut here ]------------
[ 392.207846] WARNING: at kernel/timer.c:753 mod_timer+0x55/0x18e()
[ 392.207908] Hardware name: ProLiant BL460c G6
[ 392.207965] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler ipv6 dm_mod button battery ac ehci_hcd uhci_hcd tg3 libphy bnx2x crc32c libcrc32c mdio [last unloaded: x_tables]
[ 392.208900] Pid: 5320, comm: netperf Tainted: G W 2.6.34-06175-g801cae3-dirty #33
[ 392.208979] Call Trace:
[ 392.209036] [<c102df55>] ? warn_slowpath_common+0x5d/0x70
[ 392.209098] [<c102df73>] ? warn_slowpath_null+0xb/0xd
[ 392.209159] [<c10388de>] ? mod_timer+0x55/0x18e
[ 392.209221] [<c1279ce7>] ? tcp_send_delayed_ack+0xb5/0xc1
[ 392.209282] [<c1276d26>] ? tcp_rcv_established+0x39f/0x4f7
[ 392.209345] [<c127bae5>] ? tcp_v4_do_rcv+0x22/0x161
[ 392.209406] [<c126d6b4>] ? tcp_prequeue_process+0x47/0x5b
[ 392.209468] [<c12701d2>] ? tcp_recvmsg+0x371/0x691
[ 392.209529] [<c12b8c91>] ? _raw_spin_lock_bh+0x8/0x1e
[ 392.209590] [<c1240bcd>] ? release_sock+0x10/0xc9
[ 392.216514] [<c1285bad>] ? inet_recvmsg+0x5d/0x72
[ 392.216575] [<c123e725>] ? sock_recvmsg+0xb4/0xd1
[ 392.216636] [<c1032ace>] ? irq_exit+0x39/0x5b
[ 392.216696] [<c123fab5>] ? sys_recvfrom+0xb4/0x117
[ 392.216757] [<c10483be>] ? ktime_get+0x61/0xe8
[ 392.216817] [<c1016431>] ? lapic_next_event+0x13/0x16
[ 392.216878] [<c104ba9d>] ? clockevents_program_event+0xac/0xbc
[ 392.216940] [<c104c6cc>] ? tick_dev_program_event+0x34/0x138
[ 392.217002] [<c104c7ed>] ? tick_program_event+0x1d/0x21
[ 392.217064] [<c1044da0>] ? hrtimer_interrupt+0x10b/0x1c1
[ 392.217126] [<c123fb31>] ? sys_recv+0x19/0x1d
[ 392.217186] [<c12401dc>] ? sys_socketcall+0x120/0x1c6
[ 392.217303] [<c100268c>] ? sysenter_do_call+0x12/0x22
[ 392.217364] ---[ end trace e9475c06f1d49408 ]---
Here, the incoming frame is handled by the other side (netserver),
but still for the netperf socket, (softirq handling)
tcp_v4_rcv() sets the delack timer to 37 ticks, so mod_timer() optimizations is not
working at all.
[ 392.217454] timer->expires=23019, new expires=23049(37) diff=-30 timer=e5ecb754
[ 392.217518] ------------[ cut here ]------------
[ 392.217578] WARNING: at kernel/timer.c:753 mod_timer+0x55/0x18e()
[ 392.217639] Hardware name: ProLiant BL460c G6
[ 392.217697] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler ipv6 dm_mod button battery ac ehci_hcd uhci_hcd tg3 libphy bnx2x crc32c libcrc32c mdio [last unloaded: x_tables]
[ 392.218439] Pid: 5321, comm: netserver Tainted: G W 2.6.34-06175-g801cae3-dirty #33
[ 392.218526] Call Trace:
[ 392.218582] [<c102df55>] ? warn_slowpath_common+0x5d/0x70
[ 392.218644] [<c102df73>] ? warn_slowpath_null+0xb/0xd
[ 392.218705] [<c10388de>] ? mod_timer+0x55/0x18e
[ 392.218765] [<c127cf56>] ? tcp_v4_rcv+0x41c/0x6b7
[ 392.218826] [<c1265832>] ? ip_local_deliver_finish+0xe9/0x178
[ 392.218888] [<c126572a>] ? ip_rcv_finish+0x262/0x281
[ 392.218949] [<c1249986>] ? __netif_receive_skb+0x267/0x282
[ 392.219011] [<c1249a0d>] ? process_backlog+0x6c/0x113
[ 392.219072] [<c124a2e6>] ? net_rx_action+0x8a/0x15a
[ 392.219133] [<c106234e>] ? __rcu_process_callbacks+0xb9/0x1d1
[ 392.219195] [<c1032910>] ? __do_softirq+0x0/0x13a
[ 392.219255] [<c10329b5>] ? __do_softirq+0xa5/0x13a
[ 392.219316] [<c1032910>] ? __do_softirq+0x0/0x13a
[ 392.219375] <IRQ> [<c1032449>] ? local_bh_enable+0x5f/0x6a
[ 392.219474] [<c124c0be>] ? dev_queue_xmit+0x34d/0x37a
[ 392.219536] [<c1268115>] ? ip_finish_output+0x1c7/0x1ff
[ 392.219610] [<c1268255>] ? ip_local_out+0x18/0x1a
[ 392.219670] [<c12684f4>] ? ip_queue_xmit+0x29d/0x2d5
[ 392.219731] [<c1293e89>] ? bictcp_acked+0x4f/0x139
[ 392.219791] [<c1275ee8>] ? tcp_ack+0x155b/0x16e9
[ 392.219851] [<c127851a>] ? tcp_transmit_skb+0x62a/0x65f
[ 392.219912] [<c1038a03>] ? mod_timer+0x17a/0x18e
[ 392.219972] [<c127900a>] ? tcp_write_xmit+0x73a/0x81c
[ 392.220033] [<c109e487>] ? __kmalloc_node+0x30/0x76
[ 392.220095] [<c1279127>] ? __tcp_push_pending_frames+0x15/0x6c
[ 392.220159] [<c126f214>] ? tcp_sendmsg+0x7ee/0x8c5
[ 392.220223] [<c123e823>] ? sock_sendmsg+0xa7/0xc1
[ 392.220284] [<c1044da0>] ? hrtimer_interrupt+0x10b/0x1c1
[ 392.220346] [<c1032ace>] ? irq_exit+0x39/0x5b
[ 392.220406] [<c1016811>] ? smp_apic_timer_interrupt+0x6b/0x75
[ 392.220468] [<c102007b>] ? pud_huge+0x1/0x9
[ 392.220536] [<c123f9b9>] ? sys_sendto+0xfc/0x127
[ 392.220599] [<c1044bd4>] ? hrtimer_start_range_ns+0xf/0x13
[ 392.220661] [<c1023848>] ? update_curr+0x60/0xdf
[ 392.220722] [<c1044871>] ? hrtimer_forward+0x10f/0x123
[ 392.220784] [<c10483be>] ? ktime_get+0x61/0xe8
[ 392.220844] [<c123f9fd>] ? sys_send+0x19/0x1d
[ 392.220903] [<c12401af>] ? sys_socketcall+0xf3/0x1c6
[ 392.220964] [<c1032ace>] ? irq_exit+0x39/0x5b
[ 392.221024] [<c1016811>] ? smp_apic_timer_interrupt+0x6b/0x75
[ 392.221086] [<c100268c>] ? sysenter_do_call+0x12/0x22
[ 392.221147] ---[ end trace e9475c06f1d49409 ]---
Its a bit late here to investigate, maybe one of you guys have an idea about this...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists