netdev - Re: [PATCH v4] tcp: Add memory barrier to tcp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a334775ce779be6b65928839dbc63fefc5d04086.camel@redhat.com>
Date: Mon, 22 Jan 2024 15:40:48 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Salvatore Dipietro <dipiets@...zon.com>, edumazet@...gle.com, 
	davem@...emloft.net, dsahern@...nel.org, kuba@...nel.org
Cc: netdev@...r.kernel.org, blakgeof@...zon.com, alisaidi@...zon.com, 
	benh@...zon.com, dipietro.salvatore@...il.com
Subject: Re: [PATCH v4] tcp: Add memory barrier to tcp_push()

On Fri, 2024-01-19 at 11:01 -0800, Salvatore Dipietro wrote:
> On CPUs with weak memory models, reads and updates performed by tcp_push
> to the sk variables can get reordered leaving the socket throttled when
> it should not. The tasklet running tcp_wfree() may also not observe the
> memory updates in time and will skip flushing any packets throttled by
> tcp_push(), delaying the sending. This can pathologically cause 40ms
> extra latency due to bad interactions with delayed acks.
> 
> Adding a memory barrier in tcp_push removes the bug, similarly to the
> previous commit bf06200e732d ("tcp: tsq: fix nonagle handling").
> smp_mb__after_atomic() is used to not incur in unnecessary overhead
> on x86 since not affected.
> 
> Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu
> 22.04 and Apache Tomcat 9.0.83 running the basic servlet below:
> 
> import java.io.IOException;
> import java.io.OutputStreamWriter;
> import java.io.PrintWriter;
> import javax.servlet.ServletException;
> import javax.servlet.http.HttpServlet;
> import javax.servlet.http.HttpServletRequest;
> import javax.servlet.http.HttpServletResponse;
> 
> public class HelloWorldServlet extends HttpServlet {
>     @Override
>     protected void doGet(HttpServletRequest request, HttpServletResponse response)
>       throws ServletException, IOException {
>         response.setContentType("text/html;charset=utf-8");
>         OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
>         String s = "a".repeat(3096);
>         osw.write(s,0,s.length());
>         osw.flush();
>     }
> }
> 
> Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
> c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
> values is observed while, with the patch, the extra latency disappears.
> 
> No patch and tcp_autocorking=1
> ./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
>   ...
>  50.000%    0.91ms
>  75.000%    1.13ms
>  90.000%    1.46ms
>  99.000%    1.74ms
>  99.900%    1.89ms
>  99.990%   41.95ms  <<< 40+ ms extra latency
>  99.999%   48.32ms
> 100.000%   48.96ms
> 
> With patch and tcp_autocorking=1
> ./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
>   ...
>  50.000%    0.90ms
>  75.000%    1.13ms
>  90.000%    1.45ms
>  99.000%    1.72ms
>  99.900%    1.83ms
>  99.990%    2.11ms  <<< no 40+ ms extra latency
>  99.999%    2.53ms
> 100.000%    2.62ms
> 
> Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
> affected by this issue and the patch doesn't introduce any additional
> delay.
> 
> Fixes: 7aa5470c2c09 ("tcp: tsq: move tsq_flags close to sk_wmem_alloc")
> Signed-off-by: Salvatore Dipietro <dipiets@...zon.com>

Thank you for the great analysis and the extra iteration! This was
completely non trivial to me. 

The patch LGTM 

Acked-by: Paolo Abeni <pabeni@...hat.com>

I hope to see you both (Salvatore and Geoff) more often on the ML.

Cheers,

Paolo