[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250714164625.788f7044@kernel.org>
Date: Mon, 14 Jul 2025 16:46:25 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: <fan.yu9@....com.cn>
Cc: <edumazet@...gle.com>, <kuniyu@...zon.com>, <ncardwell@...gle.com>,
<davem@...emloft.net>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-trace-kernel@...r.kernel.org>,
<yang.yang29@....com.cn>, <xu.xin16@....com.cn>, <tu.qiang35@....com.cn>,
<jiang.kun2@....com.cn>
Subject: Re: [PATCH net-next v4] tcp: extend tcp_retransmit_skb tracepoint
with failure reasons
On Thu, 10 Jul 2025 10:01:38 +0800 (CST) fan.yu9@....com.cn wrote:
> Background
> ==========
> When TCP retransmits a packet due to missing ACKs, the
> retransmission may fail for various reasons (e.g., packets
> stuck in driver queues, sequence errors, or routing issues).
>
> The original tcp_retransmit_skb tracepoint:
> 'commit e086101b150a ("tcp: add a tracepoint for tcp retransmission")'
> lacks visibility into these failure causes, making production
> diagnostics difficult.
>
> Solution
> ========
> Adds a "result" field to the tcp_retransmit_skb tracepoint,
> enumerating with explicit failure cases:
> TCP_RETRANS_ERR_DEFAULT (retransmit terminate unexpectedly)
> TCP_RETRANS_IN_HOST_QUEUE (packet still queued in driver)
> TCP_RETRANS_END_SEQ_ERROR (invalid end sequence)
> TCP_RETRANS_NOMEM (retransmit no memory)
> TCP_RETRANS_ROUTE_FAIL (routing failure)
> TCP_RETRANS_RCV_ZERO_WINDOW (closed receiver window)
Have you tried to use this or perform some analysis of which of these
reasons actually make sense to add? I'd venture a guess that
IN_HOST_QUEUE will dominate in datacenter. Maybe RCV_ZERO_WINDOW
can happen. Tracing ENOMEM is a waste of time, so is this:
if (unlikely(before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))) {
>>>>> WARN_ON_ONCE(1); <<<<<<<<
- return -EINVAL;
+ result = TCP_RETRANS_END_SEQ_ERROR;
--
pw-bot: cr
Powered by blists - more mailing lists