netdev - Re: [PATCH net-next] tcp: try to defer / return acked skbs to originating CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20260117150346.72265ac3@kernel.org>
Date: Sat, 17 Jan 2026 15:03:46 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Eric Dumazet <edumazet@...gle.com>
Cc: kuniyu@...gle.com, ncardwell@...gle.com, netdev@...r.kernel.org,
 davem@...emloft.net, pabeni@...hat.com, andrew+netdev@...n.ch,
 horms@...nel.org
Subject: Re: [PATCH net-next] tcp: try to defer / return acked skbs to
 originating CPU

On Sat, 17 Jan 2026 19:16:57 +0100 Eric Dumazet wrote:
> On Sat, Jan 17, 2026 at 5:43 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > Running a memcache-like workload under production(ish) load
> > on a 300 thread AMD machine we see ~3% of CPU time spent
> > in kmem_cache_free() via tcp_ack(), freeing skbs from rtx queue.
> > This workloads pins workers away from softirq CPU so
> > the Tx skbs are pretty much always allocated on a different
> > CPU than where the ACKs arrive. Try to use the defer skb free
> > queue to return the skbs back to where they came from.
> > This results in a ~4% performance improvement for the workload.
> 
> This probably makes sense when RFS is not used.
> Here, RFS gives us ~40% performance improvement for typical RPC workloads,
> so I never took a look at this side :)

This workload doesn't like RFS. Maybe because it has 1M sockets..
I'll need to look closer, the patchwork queue first tho.. :)

> Have you tested what happens for bulk sends ?
> sendmsg() allocates skbs and push them to transmit queue,
> but ACK can decide to split TSO packets, and the new allocation is done
> on the softirq CPU (assuming RFS is not used)
> 
> Perhaps tso_fragment()/tcp_fragment() could copy the source
> skb->alloc_cpu to (new)buff->alloc_cpu.

I'll do some synthetic testing and get back.

> Also, if workers are away from softirq, they will only process the
> defer queue in large patches, after receiving an trigger_rx_softirq()
> IPI.
> Any idea of skb_defer_free_flush() latency when dealing with batches
> of ~64 big TSO packets ?

Not sure if there's much we can do about that.. Perhaps we should have 
a shrinker that flushes the defer queues? I chatted with Shakeel briefly
and it sounded fairly straightforward.