lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSApvaDEvRFdvd7X_-Jcw+uUHD775uM2TyVg383ecLE2CMV8g@mail.gmail.com>
Date: Thu, 7 Sep 2023 10:35:41 -0400
From: Soheil Hassas Yeganeh <soheil@...gle.com>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: Eric Dumazet <edumazet@...gle.com>, "David S . Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com, Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [RFC net-next 4/4] tcp: defer regular ACK while processing socket backlog

On Thu, Sep 7, 2023 at 10:08 AM Neal Cardwell <ncardwell@...gle.com> wrote:
>
> On Wed, Sep 6, 2023 at 4:10 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > This idea came after a particular workload requested
> > the quickack attribute set on routes, and a performance
> > drop was noticed for large bulk transfers.
> >
> > For high throughput flows, it is best to use one cpu
> > running the user thread issuing socket system calls,
> > and a separate cpu to process incoming packets from BH context.
> > (With TSO/GRO, bottleneck is usually the 'user' cpu)
> >
> > Problem is the user thread can spend a lot of time while holding
> > the socket lock, forcing BH handler to queue most of incoming
> > packets in the socket backlog.
> >
> > Whenever the user thread releases the socket lock, it must first
> > process all accumulated packets in the backlog, potentially
> > adding latency spikes. Due to flood mitigation, having too many
> > packets in the backlog increases chance of unexpected drops.
> >
> > Backlog processing unfortunately shifts a fair amount of cpu cycles
> > from the BH cpu to the 'user' cpu, thus reducing max throughput.
> >
> > This patch takes advantage of the backlog processing,
> > and the fact that ACK are mostly cumulative.
> >
> > The idea is to detect we are in the backlog processing
> > and defer all eligible ACK into a single one,
> > sent from tcp_release_cb().
> >
> > This saves cpu cycles on both sides, and network resources.
> >
> > Performance of a single TCP flow on a 200Gbit NIC:
> >
> > - Throughput is increased by 20% (100Gbit -> 120Gbit).
> > - Number of generated ACK per second shrinks from 240,000 to 40,000.
> > - Number of backlog drops per second shrinks from 230 to 0.
> >
> > Benchmark context:
> >  - Regular netperf TCP_STREAM (no zerocopy)
> >  - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
> >  - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)
> >
> > This feature is guarded by a new sysctl, and enabled by default:
> >  /proc/sys/net/ipv4/tcp_backlog_ack_defer
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>
> Acked-by: Neal Cardwell <ncardwell@...gle.com>
>
> Yet another fantastic optimization. Thanks, Eric!

Acked-by: Soheil Hassas Yeganeh <soheil@...gle.com>

This is really superb!  Thank you, Eric!


> neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ