[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250214064250.85987-1-kerneljasonxing@gmail.com>
Date: Fri, 14 Feb 2025 14:42:50 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
horms@...nel.org,
hawk@...nel.org,
ilias.apalodimas@...aro.org,
almasrymina@...gle.com
Cc: netdev@...r.kernel.org,
Jason Xing <kerneljasonxing@...il.com>
Subject: [PATCH net-next v3] page_pool: avoid infinite loop to schedule delayed worker
We noticed the kworker in page_pool_release_retry() was waken
up repeatedly and infinitely in production because of the
buggy driver causing the inflight less than 0 and warning
us in page_pool_inflight()[1].
Since the inflight value goes negative, it means we should
not expect the whole page_pool to get back to work normally.
This patch mitigates the adverse effect by not rescheduling
the kworker when detecting the inflight negative in
page_pool_release_retry().
[1]
[Mon Feb 10 20:36:11 2025] ------------[ cut here ]------------
[Mon Feb 10 20:36:11 2025] Negative(-51446) inflight packet-pages
...
[Mon Feb 10 20:36:11 2025] Call Trace:
[Mon Feb 10 20:36:11 2025] page_pool_release_retry+0x23/0x70
[Mon Feb 10 20:36:11 2025] process_one_work+0x1b1/0x370
[Mon Feb 10 20:36:11 2025] worker_thread+0x37/0x3a0
[Mon Feb 10 20:36:11 2025] kthread+0x11a/0x140
[Mon Feb 10 20:36:11 2025] ? process_one_work+0x370/0x370
[Mon Feb 10 20:36:11 2025] ? __kthread_cancel_work+0x40/0x40
[Mon Feb 10 20:36:11 2025] ret_from_fork+0x35/0x40
[Mon Feb 10 20:36:11 2025] ---[ end trace ebffe800f33e7e34 ]---
Note: before this patch, the above calltrace would flood the
dmesg due to repeated reschedule of release_dw kworker.
Signed-off-by: Jason Xing <kerneljasonxing@...il.com>
---
v3
Link: https://lore.kernel.org/all/20250213052150.18392-1-kerneljasonxing@gmail.com/
1. remove printing warning even when inflight is negative, or else
it will cause panic.
v2
Link: https://lore.kernel.org/all/20250210130953.26831-1-kerneljasonxing@gmail.com/
1. add more details in commit message.
2. allow printing once when the inflight is negative.
3. correct the position where to stop the reschedule.
Suggested by Mina and Jakub.
---
net/core/page_pool.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1c6fec08bc43..acef1fcd8ddc 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct work_struct *wq)
int inflight;
inflight = page_pool_release(pool);
- if (!inflight)
+ /* In rare cases, a driver bug may cause inflight to go negative.
+ * Don't reschedule release if inflight is 0 or negative.
+ * - If 0, the page_pool has been destroyed
+ * - if negative, we will never recover
+ * in both cases no reschedule is necessary.
+ */
+ if (inflight <= 0)
return;
/* Periodic warning for page pools the user can't see */
--
2.43.5
Powered by blists - more mailing lists