[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJvyYjiweCESQL8E-Si7M=gosYvh1BAVWwAWycXW8GSdg@mail.gmail.com>
Date: Tue, 1 Jul 2025 09:37:36 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: xu.xin16@....com.cn
Cc: kuba@...nel.org, kuniyu@...zon.com, ncardwell@...gle.com,
davem@...emloft.net, horms@...nel.org, dsahern@...nel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, fan.yu9@....com.cn
Subject: Re: [PATCH net-next v2] tcp: extend tcp_retransmit_skb tracepoint
with failure reasons
On Tue, Jul 1, 2025 at 2:42 AM <xu.xin16@....com.cn> wrote:
>
> From: Fan Yu <fan.yu9@....com.cn>
>
> Background
> ==========
> When TCP retransmits a packet due to missing ACKs, the
> retransmission may fail for various reasons (e.g., packets
> stuck in driver queues, sequence errors, or routing issues).
>
> The original tcp_retransmit_skb tracepoint:
> 'commit e086101b150a ("tcp: add a tracepoint for tcp retransmission")'
> lacks visibility into these failure causes, making production
> diagnostics difficult.
>
> Solution
> =======
> Adds a 'reason' field to the tcp_retransmit_skb tracepoint,
> enumerating with explicit failure cases:
> TCP_RETRANS_ERR_DEFAULT (retransmit terminate unexpectedly)
> TCP_RETRANS_IN_HOST_QUEUE (packet still queued in driver)
> TCP_RETRANS_END_SEQ_ERROR (invalid end sequence)
> TCP_RETRANS_TRIM_HEAD_NOMEM (trim head no memory)
> TCP_RETRANS_UNCLONE_NOMEM (skb unclone keeptruesize no memory)
> TCP_RETRANS_FRAG_NOMEM (fragment no memory)
> TCP_RETRANS_ROUTE_FAIL (routing failure)
> TCP_RETRANS_RCV_ZERO_WINDOW (closed recevier window)
> TCP_RETRANS_PSKB_COPY_NOBUFS (no buffer for skb copy)
>
> Functionality
> =============
> Enables users to know why some tcp retransmission quitted and filter
> retransmission failures by reason.
>
>
...
> +enum tcp_retransmit_quit_reason {
> + TCP_RETRANS_ERR_DEFAULT,
> + TCP_RETRANS_SUCCESS,
> + TCP_RETRANS_IN_HOST_QUEUE,
> + TCP_RETRANS_END_SEQ_ERROR,
> + TCP_RETRANS_TRIM_HEAD_NOMEM,
> + TCP_RETRANS_UNCLONE_NOMEM,
> + TCP_RETRANS_FRAG_NOMEM,
> + TCP_RETRANS_ROUTE_FAIL,
> + TCP_RETRANS_RCV_ZERO_WINDOW,
> + TCP_RETRANS_PSKB_COPY_NOBUFS,
> +};
> +
> #define tcp_sk(ptr) container_of_const(ptr, struct tcp_sock, inet_conn.icsk_inet.sk)
>
> /* Variant of tcp_sk() upgrading a const sock to a read/write tcp socket.
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 54e60c6009e3..3e24740d759e 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -13,17 +13,38 @@
> #include <linux/sock_diag.h>
> #include <net/rstreason.h>
>
> -/*
> - * tcp event with arguments sk and skb
> - *
> - * Note: this class requires a valid sk pointer; while skb pointer could
> - * be NULL.
> - */
> -DECLARE_EVENT_CLASS(tcp_event_sk_skb,
> +#define TCP_RETRANSMIT_QUIT_REASON \
> + ENUM(TCP_RETRANS_ERR_DEFAULT, "retransmit terminate unexpectedly") \
> + ENUM(TCP_RETRANS_SUCCESS, "retransmit successfully") \
> + ENUM(TCP_RETRANS_IN_HOST_QUEUE, "packet still queued in driver") \
> + ENUM(TCP_RETRANS_END_SEQ_ERROR, "invalid end sequence") \
> + ENUM(TCP_RETRANS_TRIM_HEAD_NOMEM, "trim head no memory") \
> + ENUM(TCP_RETRANS_UNCLONE_NOMEM, "skb unclone keeptruesize no memory") \
> + ENUM(TCP_RETRANS_FRAG_NOMEM, "fragment no memory") \
Do we really need 3 + 1 different 'NOMEMORY' status ?
> + ENUM(TCP_RETRANS_ROUTE_FAIL, "routing failure") \
> + ENUM(TCP_RETRANS_RCV_ZERO_WINDOW, "closed recevier window") \
receiver
> + ENUMe(TCP_RETRANS_PSKB_COPY_NOBUFS, "no buffer for skb copy") \
-> another NOMEM...
> +
> +
> + __entry->quit_reason = quit_reason;
> ),
>
> - TP_printk("skbaddr=%p skaddr=%p family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s",
> + TP_printk("skbaddr=%p skaddr=%p family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s quit_reason=%s",
quit_reason is really weird, since most retransmits are a success.
What about using : status or result ?
Also, for scripts parsing the output, I would try to keep key=val
format (no space in @val), and concise 'vals'
Powered by blists - more mailing lists