[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231012195127.129585-6-saeed@kernel.org>
Date: Thu, 12 Oct 2023 12:51:22 -0700
From: Saeed Mahameed <saeed@...nel.org>
To: "David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Eric Dumazet <edumazet@...gle.com>
Cc: Saeed Mahameed <saeedm@...dia.com>,
netdev@...r.kernel.org,
Tariq Toukan <tariqt@...dia.com>,
Dragos Tatulea <dtatulea@...dia.com>,
Chris Mason <clm@...com>
Subject: [net 05/10] net/mlx5e: RX, Fix page_pool allocation failure recovery for striding rq
From: Dragos Tatulea <dtatulea@...dia.com>
When a page allocation fails during refill in mlx5e_post_rx_mpwqes, the
page will be released again on the next refill call. This triggers the
page_pool negative page fragment count warning below:
[ 2436.447717] WARNING: CPU: 1 PID: 2419 at include/net/page_pool/helpers.h:130 mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core]
...
[ 2436.447895] RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core]
[ 2436.447991] Call Trace:
[ 2436.447975] mlx5e_post_rx_mpwqes+0x1d5/0xcf0 [mlx5_core]
[ 2436.447994] <IRQ>
[ 2436.447996] ? __warn+0x7d/0x120
[ 2436.448009] ? mlx5e_handle_rx_cqe_mpwrq+0x109/0x1d0 [mlx5_core]
[ 2436.448002] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core]
[ 2436.448044] ? mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core]
[ 2436.448061] ? report_bug+0x155/0x180
[ 2436.448065] ? handle_bug+0x36/0x70
[ 2436.448067] ? exc_invalid_op+0x13/0x60
[ 2436.448070] ? asm_exc_invalid_op+0x16/0x20
[ 2436.448079] mlx5e_napi_poll+0x122/0x6b0 [mlx5_core]
[ 2436.448077] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core]
[ 2436.448113] ? generic_exec_single+0x35/0x100
[ 2436.448117] __napi_poll+0x25/0x1a0
[ 2436.448120] net_rx_action+0x28a/0x300
[ 2436.448122] __do_softirq+0xcd/0x279
[ 2436.448126] irq_exit_rcu+0x6a/0x90
[ 2436.448128] sysvec_apic_timer_interrupt+0x6e/0x90
[ 2436.448130] </IRQ>
This patch fixes the striding rq case by setting the skip flag on all
the wqe pages that were expected to have new pages allocated.
Fixes: 4c2a13236807 ("net/mlx5e: RX, Defer page release in striding rq for better recycling")
Tested-by: Chris Mason <clm@...com>
Reported-by: Chris Mason <clm@...com>
Closes: https://lore.kernel.org/netdev/117FF31A-7BE0-4050-B2BB-E41F224FF72F@meta.com
Signed-off-by: Dragos Tatulea <dtatulea@...dia.com>
Reviewed-by: Tariq Toukan <tariqt@...dia.com>
Signed-off-by: Saeed Mahameed <saeedm@...dia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 3fd11b0761e0..7988b3a9598c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -816,6 +816,8 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
mlx5e_page_release_fragmented(rq, frag_page);
}
+ bitmap_fill(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe);
+
err:
rq->stats->buff_alloc_err++;
--
2.41.0
Powered by blists - more mailing lists