lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJvyYjiweCESQL8E-Si7M=gosYvh1BAVWwAWycXW8GSdg@mail.gmail.com>
Date: Tue, 1 Jul 2025 09:37:36 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: xu.xin16@....com.cn
Cc: kuba@...nel.org, kuniyu@...zon.com, ncardwell@...gle.com, 
	davem@...emloft.net, horms@...nel.org, dsahern@...nel.org, 
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org, fan.yu9@....com.cn
Subject: Re: [PATCH net-next v2] tcp: extend tcp_retransmit_skb tracepoint
 with failure reasons

On Tue, Jul 1, 2025 at 2:42 AM <xu.xin16@....com.cn> wrote:
>
> From: Fan Yu <fan.yu9@....com.cn>
>
> Background
> ==========
> When TCP retransmits a packet due to missing ACKs, the
> retransmission may fail for various reasons (e.g., packets
> stuck in driver queues, sequence errors, or routing issues).
>
> The original tcp_retransmit_skb tracepoint:
> 'commit e086101b150a ("tcp: add a tracepoint for tcp retransmission")'
> lacks visibility into these failure causes, making production
> diagnostics difficult.
>
> Solution
> =======
> Adds a 'reason' field to the tcp_retransmit_skb tracepoint,
> enumerating with explicit failure cases:
> TCP_RETRANS_ERR_DEFAULT (retransmit terminate unexpectedly)
> TCP_RETRANS_IN_HOST_QUEUE (packet still queued in driver)
> TCP_RETRANS_END_SEQ_ERROR (invalid end sequence)
> TCP_RETRANS_TRIM_HEAD_NOMEM (trim head no memory)
> TCP_RETRANS_UNCLONE_NOMEM (skb unclone keeptruesize no memory)
> TCP_RETRANS_FRAG_NOMEM (fragment no memory)
> TCP_RETRANS_ROUTE_FAIL (routing failure)
> TCP_RETRANS_RCV_ZERO_WINDOW (closed recevier window)
> TCP_RETRANS_PSKB_COPY_NOBUFS (no buffer for skb copy)
>
> Functionality
> =============
> Enables users to know why some tcp retransmission quitted and filter
> retransmission failures by reason.
>
>

...

> +enum tcp_retransmit_quit_reason {
> +       TCP_RETRANS_ERR_DEFAULT,
> +       TCP_RETRANS_SUCCESS,
> +       TCP_RETRANS_IN_HOST_QUEUE,
> +       TCP_RETRANS_END_SEQ_ERROR,
> +       TCP_RETRANS_TRIM_HEAD_NOMEM,
> +       TCP_RETRANS_UNCLONE_NOMEM,
> +       TCP_RETRANS_FRAG_NOMEM,
> +       TCP_RETRANS_ROUTE_FAIL,
> +       TCP_RETRANS_RCV_ZERO_WINDOW,
> +       TCP_RETRANS_PSKB_COPY_NOBUFS,
> +};
> +
>  #define tcp_sk(ptr) container_of_const(ptr, struct tcp_sock, inet_conn.icsk_inet.sk)
>
>  /* Variant of tcp_sk() upgrading a const sock to a read/write tcp socket.
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 54e60c6009e3..3e24740d759e 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -13,17 +13,38 @@
>  #include <linux/sock_diag.h>
>  #include <net/rstreason.h>
>
> -/*
> - * tcp event with arguments sk and skb
> - *
> - * Note: this class requires a valid sk pointer; while skb pointer could
> - *       be NULL.
> - */
> -DECLARE_EVENT_CLASS(tcp_event_sk_skb,
> +#define TCP_RETRANSMIT_QUIT_REASON             \
> +               ENUM(TCP_RETRANS_ERR_DEFAULT,           "retransmit terminate unexpectedly")    \
> +               ENUM(TCP_RETRANS_SUCCESS,               "retransmit successfully")              \
> +               ENUM(TCP_RETRANS_IN_HOST_QUEUE,         "packet still queued in driver")        \
> +               ENUM(TCP_RETRANS_END_SEQ_ERROR,         "invalid end sequence")                 \
> +               ENUM(TCP_RETRANS_TRIM_HEAD_NOMEM,       "trim head no memory")                  \
> +               ENUM(TCP_RETRANS_UNCLONE_NOMEM,         "skb unclone keeptruesize no memory")   \
> +               ENUM(TCP_RETRANS_FRAG_NOMEM,            "fragment no memory")                   \

Do we really need 3 + 1 different 'NOMEMORY' status ?

> +               ENUM(TCP_RETRANS_ROUTE_FAIL,            "routing failure")                      \
> +               ENUM(TCP_RETRANS_RCV_ZERO_WINDOW,       "closed recevier window")               \

receiver

> +               ENUMe(TCP_RETRANS_PSKB_COPY_NOBUFS,     "no buffer for skb copy")               \

-> another NOMEM...

> +
> +


> +               __entry->quit_reason = quit_reason;
>         ),
>
> -       TP_printk("skbaddr=%p skaddr=%p family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s",
> +       TP_printk("skbaddr=%p skaddr=%p family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s quit_reason=%s",

quit_reason is really weird, since most retransmits are a success.

What about using : status or result ?

Also, for scripts parsing the output, I would try to keep  key=val
format (no space in @val), and concise 'vals'

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ