[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ecuyn5x2.fsf@cloudflare.com>
Date: Wed, 02 Jul 2025 14:17:13 +0200
From: Jakub Sitnicki <jakub@...udflare.com>
To: Cong Wang <xiyou.wangcong@...il.com>, zijianzhang@...edance.com
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, john.fastabend@...il.com,
zhoufeng.zf@...edance.com, Amery Hung <amery.hung@...edance.com>, Cong
Wang <cong.wang@...edance.com>
Subject: Re: [Patch bpf-next v4 4/4] tcp_bpf: improve ingress redirection
performance with message corking
On Mon, Jun 30, 2025 at 06:12 PM -07, Cong Wang wrote:
> From: Zijian Zhang <zijianzhang@...edance.com>
>
> The TCP_BPF ingress redirection path currently lacks the message corking
> mechanism found in standard TCP. This causes the sender to wake up the
> receiver for every message, even when messages are small, resulting in
> reduced throughput compared to regular TCP in certain scenarios.
I'm curious what scenarios are you referring to? Is it send-to-local or
ingress-to-local? [1]
If the sender is emitting small messages, that's probably intended -
that is they likely want to get the message across as soon as possible,
because They must have disabled the Nagle algo (set TCP_NODELAY) to do
that.
Otherwise, you get small segment merging on the sender side by default.
And if MTU is a limiting factor, you should also be getting batching
from GRO.
What I'm getting at is that I don't quite follow why you don't see
sufficient batching before the sockmap redirect today?
> This change introduces a kernel worker-based intermediate layer to provide
> automatic message corking for TCP_BPF. While this adds a slight latency
> overhead, it significantly improves overall throughput by reducing
> unnecessary wake-ups and reducing the sock lock contention.
"Slight" for a +5% increase in latency is an understatement :-)
IDK about this being always on for every socket. For send-to-local
[1], sk_msg redirs can be viewed as a form of IPC, where latency
matters.
I do understand that you're trying to optimize for bulk-transfer
workloads, but please consider also request-response workloads.
[1] https://github.com/jsitnicki/kubecon-2024-sockmap/blob/main/cheatsheet-sockmap-redirect.png
> Reviewed-by: Amery Hung <amery.hung@...edance.com>
> Co-developed-by: Cong Wang <cong.wang@...edance.com>
> Signed-off-by: Cong Wang <cong.wang@...edance.com>
> Signed-off-by: Zijian Zhang <zijianzhang@...edance.com>
> ---
Powered by blists - more mailing lists