netdev - Re: [RFC]Page pool buffers stuck in App's socket queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHS8izN6M3Rkm_woO9kiqPfHxb6g+=gNo7NEjQBZdA4d+rPPnQ@mail.gmail.com>
Date: Tue, 17 Jun 2025 14:00:04 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Ratheesh Kannoth <rkannoth@...vell.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org, davem@...emloft.net, 
	edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, 
	linyunsheng@...wei.com
Subject: Re: [RFC]Page pool buffers stuck in App's socket queue

On Mon, Jun 16, 2025 at 1:06 AM Ratheesh Kannoth <rkannoth@...vell.com> wrote:
>
> Hi,
>
> Recently customer faced a page pool leak issue And keeps on gettting following message in
> console.
> "page_pool_release_retry() stalled pool shutdown 1 inflight 60 sec"
>

This is not exactly a 'leak' per say. The page_pool doesn't allow
itself to exit until all the packets that came from it are freed. The
line just tells the user this is happening.

> Customer runs "ping" process in background and then does a interface down/up thru "ip" command.
>
> Marvell octeotx2 driver does destroy all resources (including page pool allocated for each queue of
> net device) during interface down event. This page pool destruction will wait for all page pool buffers
> allocated by that instance to return to the pool, hence the above message (if some buffers
> are stuck).
>
> In the customer scenario, ping App opens both RAW and RAW6 sockets. Even though Customer ping
> only ipv4 address, this RAW6 socket receives some IPV6 Router Advertisement messages which gets generated
> in their network.
>
> [   41.643448]  raw6_local_deliver+0xc0/0x1d8
> [   41.647539]  ip6_protocol_deliver_rcu+0x60/0x490
> [   41.652149]  ip6_input_finish+0x48/0x70
> [   41.655976]  ip6_input+0x44/0xcc
> [   41.659196]  ip6_sublist_rcv_finish+0x48/0x68
> [   41.663546]  ip6_sublist_rcv+0x16c/0x22c
> [   41.667460]  ipv6_list_rcv+0xf4/0x12c
>
> Those packets will never gets processed. And if customer does a interface down/up, page pool
> warnings will be shown in the console.
>

Right, I have a few recommendations here:

1. Check that commit be0096676e23 ("net: page_pool: mute the periodic
warning for visible page pools") is in your kernel. That mutes
warnings for visible page_pools.

2. Fix the application to not leave behind these RAW6 socket data.
Either processing the data incoming in the socket or closing the
socket itself would be sufficient.

> Customer was asking us for a mechanism to drain these sockets, as they dont want to kill their Apps.
> The proposal is to have debugfs which shows "pid  last_processed_skb_time  number_of_packets  socket_fd/inode_number"
> for each raw6/raw4 sockets created in the system. and
> any write to the debugfs (any specific command) will drain the socket.
>
> 1. Could you please comment on the proposal ?

Oh boy. I don't think this would fly at all. The userspace simply
closing the RAW6 socket would 'fix' the issue, unless I'm missing
something.

Having a roundabout debugfs entry that does the same thing that
`close(socket_fd);` would do is going to be a very hard sell upstream.

I think we could also mute the page_pool warning or make it less
visible. The kernel usually doesn't warn when the userspace is leaking
data.

We could also do what Yunsheng suggests and actually disconnect the
pages from the page_pool and let the page_pool clean up, but that may
be a complicated change.

Honsetly there are a lot of better solutions here than this debugfs file.

-- 
Thanks,
Mina