[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240117231646.22853-1-dipiets@amazon.com>
Date: Wed, 17 Jan 2024 15:16:46 -0800
From: Salvatore Dipietro <dipiets@...zon.com>
To: <edumazet@...gle.com>
CC: <alisaidi@...zon.com>, <benh@...zon.com>, <blakgeof@...zon.com>,
<davem@...emloft.net>, <dipietro.salvatore@...il.com>, <dipiets@...zon.com>,
<dsahern@...nel.org>, <kuba@...nel.org>, <netdev@...r.kernel.org>,
<pabeni@...hat.com>
Subject: [PATCH v3] tcp: Add memory barrier to tcp_push()
On CPUs with weak memory models, reads and updates performed by tcp_push to the
sk variables can get reordered leaving the socket throttled when it should not.
The tasklet running tcp_wfree() may also not observe the memory updates in time
and will skip flushing any packets throttled by tcp_push(), delaying the sending.
This can pathologically cause 40ms extra latency due to bad interactions with
delayed acks.
Adding a memory barrier in tcp_push before the sk_wmem_alloc read removes the
bug, similarly to the previous commit bf06200e732d ("tcp: tsq: fix nonagle
handling"). smp_mb__after_atomic() is used to not incur in unnecessary overhead
on x86 since not affected.
Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu 22.04 and
Apache Tomcat 9.0.83 running the basic servlet below:
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class HelloWorldServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=utf-8");
OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
String s = "a".repeat(3096);
osw.write(s,0,s.length());
osw.flush();
}
}
Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
values is observed while, with the patch, the extra latency disappears.
# No patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello
...
50.000% 0.91ms
75.000% 1.13ms
90.000% 1.46ms
99.000% 1.74ms
99.900% 1.89ms
99.990% 41.95ms <<< 40+ ms extra latency
99.999% 48.32ms
100.000% 48.96ms
# With patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello
...
50.000% 0.90ms
75.000% 1.13ms
90.000% 1.45ms
99.000% 1.72ms
99.900% 1.83ms
99.990% 2.11ms <<< no 40+ ms extra latency
99.999% 2.53ms
100.000% 2.62ms
Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
affected by this issue and the patch doesn't introduce any additional
delay.
Fixes: a181ceb501b3 ("tcp: autocork should not hold first packet in write
queue")
Signed-off-by: Salvatore Dipietro <dipiets@...zon.com>
---
net/ipv4/tcp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ff6838ca2e58..ab9e3922393c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -726,6 +726,7 @@ void tcp_push(struct sock *sk, int flags, int mss_now,
/* It is possible TX completion already happened
* before we set TSQ_THROTTLED.
*/
+ smp_mb__after_atomic();
if (refcount_read(&sk->sk_wmem_alloc) > skb->truesize)
return;
}
--
2.42.0
Powered by blists - more mailing lists