linux-kernel - Re: [RFC]Page pool buffers stuck in App's socket queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <825fe267-0925-4b91-98ef-82dfb768f916@huawei.com>
Date: Wed, 18 Jun 2025 14:33:32 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Mina Almasry <almasrymina@...gle.com>
CC: Ratheesh Kannoth <rkannoth@...vell.com>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <davem@...emloft.net>, <edumazet@...gle.com>,
	<kuba@...nel.org>, <pabeni@...hat.com>
Subject: Re: [RFC]Page pool buffers stuck in App's socket queue

On 2025/6/18 5:02, Mina Almasry wrote:
> On Mon, Jun 16, 2025 at 11:34 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>
>> On 2025/6/16 16:05, Ratheesh Kannoth wrote:
>>> Hi,
>>>
>>> Recently customer faced a page pool leak issue And keeps on gettting following message in
>>> console.
>>> "page_pool_release_retry() stalled pool shutdown 1 inflight 60 sec"
>>>
>>> Customer runs "ping" process in background and then does a interface down/up thru "ip" command.
>>>
>>> Marvell octeotx2 driver does destroy all resources (including page pool allocated for each queue of
>>> net device) during interface down event. This page pool destruction will wait for all page pool buffers
>>> allocated by that instance to return to the pool, hence the above message (if some buffers
>>> are stuck).
>>>
>>> In the customer scenario, ping App opens both RAW and RAW6 sockets. Even though Customer ping
>>> only ipv4 address, this RAW6 socket receives some IPV6 Router Advertisement messages which gets generated
>>> in their network.
>>>
>>> [   41.643448]  raw6_local_deliver+0xc0/0x1d8
>>> [   41.647539]  ip6_protocol_deliver_rcu+0x60/0x490
>>> [   41.652149]  ip6_input_finish+0x48/0x70
>>> [   41.655976]  ip6_input+0x44/0xcc
>>> [   41.659196]  ip6_sublist_rcv_finish+0x48/0x68
>>> [   41.663546]  ip6_sublist_rcv+0x16c/0x22c
>>> [   41.667460]  ipv6_list_rcv+0xf4/0x12c
>>>
>>> Those packets will never gets processed. And if customer does a interface down/up, page pool
>>> warnings will be shown in the console.
>>>
>>> Customer was asking us for a mechanism to drain these sockets, as they dont want to kill their Apps.
>>> The proposal is to have debugfs which shows "pid  last_processed_skb_time  number_of_packets  socket_fd/inode_number"
>>> for each raw6/raw4 sockets created in the system. and
>>> any write to the debugfs (any specific command) will drain the socket.
>>>
>>> 1. Could you please comment on the proposal ?
>>
>> I would say the above is kind of working around the problem.
>> It would be good to fix the Apps or fix the page_pool.
>>
>>> 2. Could you suggest a better way ?
>>
>> For fixing the page_pool part, I would be suggesting to keep track
>> of all the inflight pages and detach those pages from page_pool when
>> page_pool_destroy() is called, the tracking part was [1], unfortunately
>> the maintainers seemed to choose an easy way instead of a long term
>> direction, see [2].
> 
> This is not that accurate IMO. Your patch series and the merged patch
> series from Toke does the same thing: both keep track of dma-mapped

The main difference is the above 'dma-mapped'.

> pages, so that they can be unmapped at page_pool_destroy time. Toke
> just did the tracking in a simpler way that people were willing to
> review.
> 
> So, if you had a plan to detach pages on page_pool_destroy on top of
> your tracking, the exact same plan should work on top of Toke's
> tracking. It may be useful to code that and send an RFC if you have
> time. It would indeed fix this periodic warning issue.

As above, when page_pool_create() is called without PP_FLAG_DMA_MAP, the
dma-mapped pages tracking is not enough.