[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLmx=u9_==xr-2OfZRA-B3DQE11_Oz3uP-DNLH7k-HwxQ@mail.gmail.com>
Date: Thu, 18 Jan 2024 11:42:40 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Salvatore Dipietro <dipiets@...zon.com>, alisaidi@...zon.com, benh@...zon.com,
blakgeof@...zon.com, davem@...emloft.net, dipietro.salvatore@...il.com,
dsahern@...nel.org, kuba@...nel.org, netdev@...r.kernel.org
Subject: Re: [PATCH v3] tcp: Add memory barrier to tcp_push()
On Thu, Jan 18, 2024 at 11:38 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Wed, 2024-01-17 at 15:16 -0800, Salvatore Dipietro wrote:
> > On CPUs with weak memory models, reads and updates performed by tcp_push to the
> > sk variables can get reordered leaving the socket throttled when it should not.
> > The tasklet running tcp_wfree() may also not observe the memory updates in time
> > and will skip flushing any packets throttled by tcp_push(), delaying the sending.
> > This can pathologically cause 40ms extra latency due to bad interactions with
> > delayed acks.
> >
> > Adding a memory barrier in tcp_push before the sk_wmem_alloc read removes the
> > bug, similarly to the previous commit bf06200e732d ("tcp: tsq: fix nonagle
> > handling"). smp_mb__after_atomic() is used to not incur in unnecessary overhead
> > on x86 since not affected.
> >
> > Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu 22.04 and
> > Apache Tomcat 9.0.83 running the basic servlet below:
> >
> > import java.io.IOException;
> > import java.io.OutputStreamWriter;
> > import java.io.PrintWriter;
> > import javax.servlet.ServletException;
> > import javax.servlet.http.HttpServlet;
> > import javax.servlet.http.HttpServletRequest;
> > import javax.servlet.http.HttpServletResponse;
> >
> > public class HelloWorldServlet extends HttpServlet {
> > @Override
> > protected void doGet(HttpServletRequest request, HttpServletResponse response)
> > throws ServletException, IOException {
> > response.setContentType("text/html;charset=utf-8");
> > OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
> > String s = "a".repeat(3096);
> > osw.write(s,0,s.length());
> > osw.flush();
> > }
> > }
> >
> > Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
> > c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
> > values is observed while, with the patch, the extra latency disappears.
> >
> > # No patch and tcp_autocorking=1
> > ./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello
> > ...
> > 50.000% 0.91ms
> > 75.000% 1.13ms
> > 90.000% 1.46ms
> > 99.000% 1.74ms
> > 99.900% 1.89ms
> > 99.990% 41.95ms <<< 40+ ms extra latency
> > 99.999% 48.32ms
> > 100.000% 48.96ms
> >
> > # With patch and tcp_autocorking=1
> > ./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello
> > ...
> > 50.000% 0.90ms
> > 75.000% 1.13ms
> > 90.000% 1.45ms
> > 99.000% 1.72ms
> > 99.900% 1.83ms
> > 99.990% 2.11ms <<< no 40+ ms extra latency
> > 99.999% 2.53ms
> > 100.000% 2.62ms
> >
> > Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
> > affected by this issue and the patch doesn't introduce any additional
> > delay.
> >
> > Fixes: a181ceb501b3 ("tcp: autocork should not hold first packet in write
> > queue")
>
> Please read carefully the process documentation under
> Documentation/process/ and specifically the netdev specific bits:
>
> no resubmissions within the 24h grace period.
>
> Please double-check your patch with checkpatch for formal errors: the
> fixes tag must not be split across multiple lines.
>
> And please do not sent new version in reply to a previous one: it will
> confuse the bot.
>
> > Signed-off-by: Salvatore Dipietro <dipiets@...zon.com>
> > ---
> > net/ipv4/tcp.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index ff6838ca2e58..ab9e3922393c 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -726,6 +726,7 @@ void tcp_push(struct sock *sk, int flags, int mss_now,
> > /* It is possible TX completion already happened
> > * before we set TSQ_THROTTLED.
> > */
> > + smp_mb__after_atomic();
>
> Out of sheer ignorance I'm wondering if moving such barrier inside the
> above 'if' just after 'set_bit' would suffice?
I think this would work just fine.
>
> Thanks!
>
> Paolo
>
Powered by blists - more mailing lists