lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <a0dacf7c3a869ef94db8e42e22d71edea6cbd8d8.camel@nvidia.com> Date: Fri, 29 Sep 2023 09:06:12 +0000 From: Dragos Tatulea <dtatulea@...dia.com> To: "dw@...idwei.uk" <dw@...idwei.uk>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, "clm@...com" <clm@...com>, Saeed Mahameed <saeedm@...dia.com>, "kuba@...nel.org" <kuba@...nel.org> CC: Tariq Toukan <tariqt@...dia.com> Subject: Re: [PATCH RFC] net/mlx5e: avoid page pool frag counter underflow Hi Chris, On Thu, 2023-09-28 at 16:47 -0700, Chris Mason wrote: > [ This is just an RFC because I've wandered pretty far from home and > really don't know the code at hand. The errors are real though, ENOMEM during > mlx5e_refill_rx_wqes() leads to underflows and system instability ] > > mlx5e_refill_rx_wqes() has roughly the following flow: > > 1) mlx5e_free_rx_wqes() > 2) mlx5e_alloc_rx_wqes() > > We're doing bulk frees before refilling the frags in bulk, and under > normal conditions this is all well balanced. Every time we try > to refill_rx_wqes, the first thing we do is free the existing ones. > > But, if we get an ENOMEM from mlx5e_get_rx_frag(), we will have called > mlx5e_free_rx_wqes() on a bunch of frags without refilling the pages for > them. > > mlx5e_page_release_fragmented() doesn't take any steps to remember that > a given frag has been put through page_pool_defrag_page(), and so in the > ENOMEM case, repeated calls to free_rx_wqes without corresponding > allocations end up underflowing in page_pool_defrag_page() > > ret = atomic_long_sub_return(nr, &page->pp_frag_count); > WARN_ON(ret < 0); > > Reproducing this just needs a memory hog driving the system into OOM and > a heavy network rx load. > Yeap, this is a problem. Thanks for finding this! > My guess at a fix is to update our frag to make sure we don't send it > through defrag more than once. I've only lightly tested this, but it doesn't > immediately crash on OOM anymore and doesn't seem to leak. > > Fixes: 6f5742846053c7 ("net/mlx5e: RX, Enable skb page recycling through the > page_pool") > Signed-off-by: Chris Mason <clm@...com> > --- > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > index 3fd11b0761e0..9a7b10f0bba9 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > @@ -298,6 +298,16 @@ static void mlx5e_page_release_fragmented(struct mlx5e_rq > *rq, > u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags; > struct page *page = frag_page->page; > > + if (!page) > + return; > + Ideally we'd like to avoid this kind of broad check as it can hide other issues. > + /* > + * we're dropping all of our counts on this page, make sure we > + * don't do it again the next time we process this frag > + */ > + frag_page->frags = 0; > + frag_page->page = NULL; > + > if (page_pool_defrag_page(page, drain_count) == 0) > page_pool_put_defragged_page(rq->page_pool, page, -1, true); > } We already have a mechanism to avoid double releases: setting the MLX5E_WQE_FRAG_SKIP_RELEASE bit on the mlx5e_wqe_frag_info flags parameter. When mlx5e_alloc_rx_wqes fails we should set that bit on the remaining frag_pages. This is for legacy rq mode, multi-packet wqe rq mode has to be handled as well in a similar way. If I send a patch later, would you be able to test it? Thanks, Dragos
Powered by blists - more mailing lists