[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSApvaDEvRFdvd7X_-Jcw+uUHD775uM2TyVg383ecLE2CMV8g@mail.gmail.com>
Date: Thu, 7 Sep 2023 10:35:41 -0400
From: Soheil Hassas Yeganeh <soheil@...gle.com>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: Eric Dumazet <edumazet@...gle.com>, "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
eric.dumazet@...il.com, Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [RFC net-next 4/4] tcp: defer regular ACK while processing socket backlog
On Thu, Sep 7, 2023 at 10:08 AM Neal Cardwell <ncardwell@...gle.com> wrote:
>
> On Wed, Sep 6, 2023 at 4:10 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > This idea came after a particular workload requested
> > the quickack attribute set on routes, and a performance
> > drop was noticed for large bulk transfers.
> >
> > For high throughput flows, it is best to use one cpu
> > running the user thread issuing socket system calls,
> > and a separate cpu to process incoming packets from BH context.
> > (With TSO/GRO, bottleneck is usually the 'user' cpu)
> >
> > Problem is the user thread can spend a lot of time while holding
> > the socket lock, forcing BH handler to queue most of incoming
> > packets in the socket backlog.
> >
> > Whenever the user thread releases the socket lock, it must first
> > process all accumulated packets in the backlog, potentially
> > adding latency spikes. Due to flood mitigation, having too many
> > packets in the backlog increases chance of unexpected drops.
> >
> > Backlog processing unfortunately shifts a fair amount of cpu cycles
> > from the BH cpu to the 'user' cpu, thus reducing max throughput.
> >
> > This patch takes advantage of the backlog processing,
> > and the fact that ACK are mostly cumulative.
> >
> > The idea is to detect we are in the backlog processing
> > and defer all eligible ACK into a single one,
> > sent from tcp_release_cb().
> >
> > This saves cpu cycles on both sides, and network resources.
> >
> > Performance of a single TCP flow on a 200Gbit NIC:
> >
> > - Throughput is increased by 20% (100Gbit -> 120Gbit).
> > - Number of generated ACK per second shrinks from 240,000 to 40,000.
> > - Number of backlog drops per second shrinks from 230 to 0.
> >
> > Benchmark context:
> > - Regular netperf TCP_STREAM (no zerocopy)
> > - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
> > - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)
> >
> > This feature is guarded by a new sysctl, and enabled by default:
> > /proc/sys/net/ipv4/tcp_backlog_ack_defer
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>
> Acked-by: Neal Cardwell <ncardwell@...gle.com>
>
> Yet another fantastic optimization. Thanks, Eric!
Acked-by: Soheil Hassas Yeganeh <soheil@...gle.com>
This is really superb! Thank you, Eric!
> neal
Powered by blists - more mailing lists