[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20191101221605.32210-1-xiyou.wangcong@gmail.com>
Date: Fri, 1 Nov 2019 15:16:05 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: netdev@...r.kernel.org
Cc: Cong Wang <xiyou.wangcong@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
Eric Dumazet <edumazet@...gle.com>
Subject: [RFC Patch] tcp: make icsk_retransmit_timer pinned
While investigating the spinlock contention on resetting TCP
retransmit timer:
61.72% 61.71% swapper [kernel.kallsyms] [k] queued_spin_lock_slowpath
...
- 58.83% tcp_v4_rcv
- 58.80% tcp_v4_do_rcv
- 58.80% tcp_rcv_established
- 52.88% __tcp_push_pending_frames
- 52.88% tcp_write_xmit
- 28.16% tcp_event_new_data_sent
- 28.15% sk_reset_timer
+ mod_timer
- 24.68% tcp_schedule_loss_probe
- 24.68% sk_reset_timer
+ 24.68% mod_timer
it turns out to be a serious timer migration issue. After collecting timer_start
trace events for tcp_write_timer, it shows more than 77% times this timer got
migrated to a difference CPU:
$ perl -ne 'if (/\[(\d+)\].* cpu=(\d+)/){print if $1 != $2 ;}' tcp_timer_trace.txt | wc -l
1303826
$ wc -l tcp_timer_trace.txt
1681068 tcp_timer_trace.txt
$ python
Python 2.7.5 (default, Jul 11 2019, 17:13:53)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 1303826 / 1681068.0
0.7755938486723916
And all of those migration happened during an idle CPU serving a network RX
softirq. So, the logic of testing CPU idleness in idle_cpu() is false positive.
I don't know whether we should relax it for this scenario particuarly, something
like:
- if (!idle_cpu(cpu) && housekeeping_cpu(cpu, HK_FLAG_TIMER))
+ if ((!idle_cpu(cpu) || in_serving_softirq()) &&
+ housekeeping_cpu(cpu, HK_FLAG_TIMER))
return cpu;
(There could be better way than in_serving_softirq() to measure the idleness,
of course.)
Or simply just make the TCP retransmit timer pinned. At least this approach
has the minimum impact.
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Eric Dumazet <edumazet@...gle.com>
Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
---
net/ipv4/inet_connection_sock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index eb30fc1770de..de5510ddb1c8 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -507,7 +507,7 @@ void inet_csk_init_xmit_timers(struct sock *sk,
{
struct inet_connection_sock *icsk = inet_csk(sk);
- timer_setup(&icsk->icsk_retransmit_timer, retransmit_handler, 0);
+ timer_setup(&icsk->icsk_retransmit_timer, retransmit_handler, TIMER_PINNED);
timer_setup(&icsk->icsk_delack_timer, delack_handler, 0);
timer_setup(&sk->sk_timer, keepalive_handler, 0);
icsk->icsk_pending = icsk->icsk_ack.pending = 0;
--
2.21.0
Powered by blists - more mailing lists