netdev - Re: [PATCH net-next] tcp: try to defer / return acked skbs to originating CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJ8+5OaWS2VzJqo4QVN6VY9zJvrJfP0TGRGv85mj09kjA@mail.gmail.com>
Date: Sun, 18 Jan 2026 13:15:00 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: kuniyu@...gle.com, ncardwell@...gle.com, netdev@...r.kernel.org, 
	davem@...emloft.net, pabeni@...hat.com, andrew+netdev@...n.ch, 
	horms@...nel.org
Subject: Re: [PATCH net-next] tcp: try to defer / return acked skbs to
 originating CPU

On Sun, Jan 18, 2026 at 12:03 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Sat, 17 Jan 2026 19:16:57 +0100 Eric Dumazet wrote:
> > On Sat, Jan 17, 2026 at 5:43 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > Running a memcache-like workload under production(ish) load
> > > on a 300 thread AMD machine we see ~3% of CPU time spent
> > > in kmem_cache_free() via tcp_ack(), freeing skbs from rtx queue.
> > > This workloads pins workers away from softirq CPU so
> > > the Tx skbs are pretty much always allocated on a different
> > > CPU than where the ACKs arrive. Try to use the defer skb free
> > > queue to return the skbs back to where they came from.
> > > This results in a ~4% performance improvement for the workload.
> >
> > This probably makes sense when RFS is not used.
> > Here, RFS gives us ~40% performance improvement for typical RPC workloads,
> > so I never took a look at this side :)
>
> This workload doesn't like RFS. Maybe because it has 1M sockets..
> I'll need to look closer, the patchwork queue first tho.. :)
>
> > Have you tested what happens for bulk sends ?
> > sendmsg() allocates skbs and push them to transmit queue,
> > but ACK can decide to split TSO packets, and the new allocation is done
> > on the softirq CPU (assuming RFS is not used)
> >
> > Perhaps tso_fragment()/tcp_fragment() could copy the source
> > skb->alloc_cpu to (new)buff->alloc_cpu.
>
> I'll do some synthetic testing and get back.
>
> > Also, if workers are away from softirq, they will only process the
> > defer queue in large patches, after receiving an trigger_rx_softirq()
> > IPI.
> > Any idea of skb_defer_free_flush() latency when dealing with batches
> > of ~64 big TSO packets ?
>
> Not sure if there's much we can do about that.. Perhaps we should have
> a shrinker that flushes the defer queues? I chatted with Shakeel briefly
> and it sounded fairly straightforward.

I was mostly concerned about latency spikes, I did some tests here and
this seems fine.
(I assume you asked Shakeel about the extra memory being held in the
per-cpu queue, and pcp implications ?)

Reviewed-by: Eric Dumazet <edumazet@...gle.com>