netdev - Re: [RFC net-next 4/4] tcp: defer regular ACK while processing socket backlog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQynzDw6JK7CTtyTrNWiYENmOi12i9XXpEQ-+eB-dEg3fvQ@mail.gmail.com>
Date: Thu, 7 Sep 2023 10:07:59 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, eric.dumazet@...il.com, 
	Soheil Hassas Yeganeh <soheil@...gle.com>, Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [RFC net-next 4/4] tcp: defer regular ACK while processing socket backlog

On Wed, Sep 6, 2023 at 4:10 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> This idea came after a particular workload requested
> the quickack attribute set on routes, and a performance
> drop was noticed for large bulk transfers.
>
> For high throughput flows, it is best to use one cpu
> running the user thread issuing socket system calls,
> and a separate cpu to process incoming packets from BH context.
> (With TSO/GRO, bottleneck is usually the 'user' cpu)
>
> Problem is the user thread can spend a lot of time while holding
> the socket lock, forcing BH handler to queue most of incoming
> packets in the socket backlog.
>
> Whenever the user thread releases the socket lock, it must first
> process all accumulated packets in the backlog, potentially
> adding latency spikes. Due to flood mitigation, having too many
> packets in the backlog increases chance of unexpected drops.
>
> Backlog processing unfortunately shifts a fair amount of cpu cycles
> from the BH cpu to the 'user' cpu, thus reducing max throughput.
>
> This patch takes advantage of the backlog processing,
> and the fact that ACK are mostly cumulative.
>
> The idea is to detect we are in the backlog processing
> and defer all eligible ACK into a single one,
> sent from tcp_release_cb().
>
> This saves cpu cycles on both sides, and network resources.
>
> Performance of a single TCP flow on a 200Gbit NIC:
>
> - Throughput is increased by 20% (100Gbit -> 120Gbit).
> - Number of generated ACK per second shrinks from 240,000 to 40,000.
> - Number of backlog drops per second shrinks from 230 to 0.
>
> Benchmark context:
>  - Regular netperf TCP_STREAM (no zerocopy)
>  - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
>  - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)
>
> This feature is guarded by a new sysctl, and enabled by default:
>  /proc/sys/net/ipv4/tcp_backlog_ack_defer
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>

Acked-by: Neal Cardwell <ncardwell@...gle.com>

Yet another fantastic optimization. Thanks, Eric!

neal