lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b676baa0-2044-4a74-900d-f471620f2896@linux.dev>
Date: Tue, 20 Jan 2026 11:16:20 +0800
From: Leon Hwang <leon.hwang@...ux.dev>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, Jesper Dangaard Brouer <hawk@...nel.org>,
 Ilias Apalodimas <ilias.apalodimas@...aro.org>,
 Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu
 <mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 "David S . Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, kerneljasonxing@...il.com,
 lance.yang@...ux.dev, jiayuan.chen@...ux.dev, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, Leon Huang Fu <leon.huangfu@...pee.com>
Subject: Re: [PATCH net-next v4] page_pool: Add page_pool_release_stalled
 tracepoint



On 20/1/26 00:37, Jakub Kicinski wrote:
> On Mon, 19 Jan 2026 18:21:19 +0800 Leon Hwang wrote:
>> Introduce a new tracepoint to track stalled page pool releases,
>> providing better observability for page pool lifecycle issues.
> 
> Sorry, I really want you to answer the questions from the last
> paragraph of:
> 
>  https://lore.kernel.org/netdev/20260104084347.5de3a537@kernel.org/

Let me share a concrete case where this tracepoint would have helped,
and why netlink notifications were not a good fit.

I encountered the 'pr_warn()' messages during Mellanox NIC flapping on a
system using the 'mlx5_core' driver (kernel 6.6). The root cause turned
out to be an application-level issue: the IBM/sarama “Client SeekBroker
Connection Leak” [1].

In short, some TCP sockets became orphaned while still holding FINACK
skbs in their 'sk_receive_queue'. These skbs were holding inflight pages
from page pools. After NIC flapping, as long as those sockets were not
closed, the inflight pages could not be returned, and the corresponding
page pools could not be released. Once the orphaned sockets were
explicitly closed (as in [2]), the inflight pages were returned and the
page pools were eventually destroyed.

During the investigation, the dmesg output was noisy: there were many
inflight pages across multiple page pools, originating from many
orphaned sockets. This made it difficult to investigate and reason about
the issue using BPF tools.

In this scenario, a netlink notification does not seem like a good fit:

* The situation involved many page pools and many inflight pages.
* Emitting netlink notifications on each retry or stall would likely
generate a large volume of messages.
* What was needed was not a stream of notifications, but the ability to
observe and correlate page pool state over time.

A tracepoint fits this use case better. With a
'page_pool_release_stalled' tracepoint, it becomes straightforward to
use BPF tools to:

* Track which page pools are repeatedly stalled
* Correlate stalls with socket state, RX queues, or driver behavior
* Distinguish expected situations (e.g. orphaned sockets temporarily
holding pages) from genuine kernel or driver issues

>From my experience, this tracepoint complements the existing
netlink-based observability rather than duplicating it, while avoiding
the risk of excessive netlink traffic in pathological but realistic
scenarios such as NIC flapping combined with connection leaks.

Thanks,
Leon

[1] https://github.com/IBM/sarama/issues/3143
[2] https://github.com/IBM/sarama/pull/3384


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ