[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d152d5fa-e846-48ba-96f4-77493996d099@huawei.com>
Date: Tue, 17 Jun 2025 14:33:41 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Ratheesh Kannoth <rkannoth@...vell.com>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
CC: <davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
<pabeni@...hat.com>
Subject: Re: [RFC]Page pool buffers stuck in App's socket queue
On 2025/6/16 16:05, Ratheesh Kannoth wrote:
> Hi,
>
> Recently customer faced a page pool leak issue And keeps on gettting following message in
> console.
> "page_pool_release_retry() stalled pool shutdown 1 inflight 60 sec"
>
> Customer runs "ping" process in background and then does a interface down/up thru "ip" command.
>
> Marvell octeotx2 driver does destroy all resources (including page pool allocated for each queue of
> net device) during interface down event. This page pool destruction will wait for all page pool buffers
> allocated by that instance to return to the pool, hence the above message (if some buffers
> are stuck).
>
> In the customer scenario, ping App opens both RAW and RAW6 sockets. Even though Customer ping
> only ipv4 address, this RAW6 socket receives some IPV6 Router Advertisement messages which gets generated
> in their network.
>
> [ 41.643448] raw6_local_deliver+0xc0/0x1d8
> [ 41.647539] ip6_protocol_deliver_rcu+0x60/0x490
> [ 41.652149] ip6_input_finish+0x48/0x70
> [ 41.655976] ip6_input+0x44/0xcc
> [ 41.659196] ip6_sublist_rcv_finish+0x48/0x68
> [ 41.663546] ip6_sublist_rcv+0x16c/0x22c
> [ 41.667460] ipv6_list_rcv+0xf4/0x12c
>
> Those packets will never gets processed. And if customer does a interface down/up, page pool
> warnings will be shown in the console.
>
> Customer was asking us for a mechanism to drain these sockets, as they dont want to kill their Apps.
> The proposal is to have debugfs which shows "pid last_processed_skb_time number_of_packets socket_fd/inode_number"
> for each raw6/raw4 sockets created in the system. and
> any write to the debugfs (any specific command) will drain the socket.
>
> 1. Could you please comment on the proposal ?
I would say the above is kind of working around the problem.
It would be good to fix the Apps or fix the page_pool.
> 2. Could you suggest a better way ?
For fixing the page_pool part, I would be suggesting to keep track
of all the inflight pages and detach those pages from page_pool when
page_pool_destroy() is called, the tracking part was [1], unfortunately
the maintainers seemed to choose an easy way instead of a long term
direction, see [2].
1. https://lore.kernel.org/all/20250307092356.638242-1-linyunsheng@huawei.com/
2. https://lore.kernel.org/all/20250409-page-pool-track-dma-v9-0-6a9ef2e0cba8@redhat.com/
Powered by blists - more mailing lists