netdev - [PATCH RFC] net/mlx5e: avoid page pool frag counter underflow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230928234735.3026489-1-clm@fb.com>
Date: Thu, 28 Sep 2023 16:47:35 -0700
From: Chris Mason <clm@...com>
To: <netdev@...r.kernel.org>, <kuba@...nel.org>, <dw@...idwei.uk>,
        <dtatulea@...dia.com>, <saeedm@...dia.com>
Subject: [PATCH RFC] net/mlx5e: avoid page pool frag counter underflow

[ This is just an RFC because I've wandered pretty far from home and
really don't know the code at hand.  The errors are real though, ENOMEM during
mlx5e_refill_rx_wqes() leads to underflows and system instability ]

mlx5e_refill_rx_wqes() has roughly the following flow:

1) mlx5e_free_rx_wqes()
2) mlx5e_alloc_rx_wqes()

We're doing bulk frees before refilling the frags in bulk, and under
normal conditions this is all well balanced.  Every time we try
to refill_rx_wqes, the first thing we do is free the existing ones.

But, if we get an ENOMEM from mlx5e_get_rx_frag(), we will have called
mlx5e_free_rx_wqes() on a bunch of frags without refilling the pages for
them.

mlx5e_page_release_fragmented() doesn't take any steps to remember that
a given frag has been put through page_pool_defrag_page(), and so in the
ENOMEM case, repeated calls to free_rx_wqes without corresponding
allocations end up underflowing in page_pool_defrag_page()

        ret = atomic_long_sub_return(nr, &page->pp_frag_count);
	WARN_ON(ret < 0);

Reproducing this just needs a memory hog driving the system into OOM and
a heavy network rx load.

My guess at a fix is to update our frag to make sure we don't send it
through defrag more than once.  I've only lightly tested this, but it doesn't
immediately crash on OOM anymore and doesn't seem to leak.

Fixes: 6f5742846053c7 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")
Signed-off-by: Chris Mason <clm@...com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 3fd11b0761e0..9a7b10f0bba9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -298,6 +298,16 @@ static void mlx5e_page_release_fragmented(struct mlx5e_rq *rq,
 	u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags;
 	struct page *page = frag_page->page;

+	if (!page)
+		return;
+
+	/*
+	 * we're dropping all of our counts on this page, make sure we
+	 * don't do it again the next time we process this frag
+	 */
+	frag_page->frags = 0;
+	frag_page->page = NULL;
+
 	if (page_pool_defrag_page(page, drain_count) == 0)
 		page_pool_put_defragged_page(rq->page_pool, page, -1, true);
 }
-- 
2.34.1