lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250213052150.18392-1-kerneljasonxing@gmail.com>
Date: Thu, 13 Feb 2025 13:21:50 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: davem@...emloft.net,
	edumazet@...gle.com,
	kuba@...nel.org,
	pabeni@...hat.com,
	horms@...nel.org,
	hawk@...nel.org,
	ilias.apalodimas@...aro.org,
	almasrymina@...gle.com
Cc: netdev@...r.kernel.org,
	Jason Xing <kerneljasonxing@...il.com>
Subject: [PATCH v2 net-next] page_pool: avoid infinite loop to schedule delayed worker

We noticed the kworker in page_pool_release_retry() was waken
up repeatedly and infinitely in production because of the
buggy driver causing the inflight less than 0 and warning
us in page_pool_inflight()[1].

Since the inflight value goes negative, it means we should
not expect the whole page_pool to get back to work normally.

This patch mitigates the adverse effect by not rescheduling
the kworker when detecting the inflight negative in
page_pool_release_retry().

This patch allows to pr_warn() once in page_pool_release_retry()
in this case.

[1]
[Mon Feb 10 20:36:11 2025] ------------[ cut here ]------------
[Mon Feb 10 20:36:11 2025] Negative(-51446) inflight packet-pages
...
[Mon Feb 10 20:36:11 2025] Call Trace:
[Mon Feb 10 20:36:11 2025]  page_pool_release_retry+0x23/0x70
[Mon Feb 10 20:36:11 2025]  process_one_work+0x1b1/0x370
[Mon Feb 10 20:36:11 2025]  worker_thread+0x37/0x3a0
[Mon Feb 10 20:36:11 2025]  kthread+0x11a/0x140
[Mon Feb 10 20:36:11 2025]  ? process_one_work+0x370/0x370
[Mon Feb 10 20:36:11 2025]  ? __kthread_cancel_work+0x40/0x40
[Mon Feb 10 20:36:11 2025]  ret_from_fork+0x35/0x40
[Mon Feb 10 20:36:11 2025] ---[ end trace ebffe800f33e7e34 ]---
Note: before this patch, the above calltrace would flood the
dmesg due to repeated reschedule of release_dw kworker.

Signed-off-by: Jason Xing <kerneljasonxing@...il.com>
---
v2
Link: https://lore.kernel.org/all/20250210130953.26831-1-kerneljasonxing@gmail.com/
1. add more details in commit message.
2. allow printing once when the inflight is negative.
3. correct the position where to stop the reschedule.
Suggested by Mina and Jakub.
---
 net/core/page_pool.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1c6fec08bc43..e1f89a19a6b6 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1112,13 +1112,12 @@ static void page_pool_release_retry(struct work_struct *wq)
 	int inflight;
 
 	inflight = page_pool_release(pool);
-	if (!inflight)
-		return;
 
 	/* Periodic warning for page pools the user can't see */
 	netdev = READ_ONCE(pool->slow.netdev);
 	if (time_after_eq(jiffies, pool->defer_warn) &&
-	    (!netdev || netdev == NET_PTR_POISON)) {
+	    (!netdev || netdev == NET_PTR_POISON) &&
+	    inflight != 0) {
 		int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;
 
 		pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
@@ -1126,7 +1125,15 @@ static void page_pool_release_retry(struct work_struct *wq)
 		pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
 	}
 
-	/* Still not ready to be disconnected, retry later */
+	/* In rare cases, a driver bug may cause inflight to go negative.
+	 * Don't reschedule release if inflight is 0 or negative.
+	 * - If 0, the page_pool has been destroyed
+	 * - if negative, we will never recover
+	 *   in both cases no reschedule is necessary.
+	 */
+	if (inflight <= 0)
+		return;
+
 	schedule_delayed_work(&pool->release_dw, DEFER_TIME);
 }
 
-- 
2.43.5


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ