netdev - Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling causing incorrect cleanup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aL9A3JDyx3TxAzLf@mozart.vkv.me>
Date: Mon, 8 Sep 2025 13:47:24 -0700
From: Calvin Owens <calvin@...nvd.org>
To: Breno Leitao <leitao@...ian.org>
Cc: Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Shuah Khan <shuah@...nel.org>, Simon Horman <horms@...nel.org>,
	david decotigny <decot@...glers.com>, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, linux-kselftest@...r.kernel.org,
	asantostc@...il.com, efault@....de, kernel-team@...a.com,
	stable@...r.kernel.org, jv@...sburgh.net
Subject: Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling
 causing incorrect cleanup

On Friday 09/05 at 10:25 -0700, Breno Leitao wrote:
> commit efa95b01da18 ("netpoll: fix use after free") incorrectly
> ignored the refcount and prematurely set dev->npinfo to NULL during
> netpoll cleanup, leading to improper behavior and memory leaks.
> 
> Scenario causing lack of proper cleanup:
> 
> 1) A netpoll is associated with a NIC (e.g., eth0) and netdev->npinfo is
>    allocated, and refcnt = 1
>    - Keep in mind that npinfo is shared among all netpoll instances. In
>      this case, there is just one.
> 
> 2) Another netpoll is also associated with the same NIC and
>    npinfo->refcnt += 1.
>    - Now dev->npinfo->refcnt = 2;
>    - There is just one npinfo associated to the netdev.
> 
> 3) When the first netpolls goes to clean up:
>    - The first cleanup succeeds and clears np->dev->npinfo, ignoring
>      refcnt.
>      - It basically calls `RCU_INIT_POINTER(np->dev->npinfo, NULL);`
>    - Set dev->npinfo = NULL, without proper cleanup
>    - No ->ndo_netpoll_cleanup() is either called
> 
> 4) Now the second target tries to clean up
>    - The second cleanup fails because np->dev->npinfo is already NULL.
>      * In this case, ops->ndo_netpoll_cleanup() was never called, and
>        the skb pool is not cleaned as well (for the second netpoll
>        instance)
>   - This leaks npinfo and skbpool skbs, which is clearly reported by
>     kmemleak.
> 
> Revert commit efa95b01da18 ("netpoll: fix use after free") and adds
> clarifying comments emphasizing that npinfo cleanup should only happen
> once the refcount reaches zero, ensuring stable and correct netpoll
> behavior.

This makes sense to me.

Just curious, did you try the original OOPS reproducer?
https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.1404857349.git.decot@googlers.com/

I wonder if there might be a demon lurking in bonding+netpoll that this
was papering over? Not a reason not to fix the leaks IMO, I'm just
curious, I don't want to spend time on it if you already did :)

The discussion on v1 isn't enlightening either:
https://lore.kernel.org/lkml/0f692012238337f2c40893319830ae042523ce18.1404172155.git.decot@googlers.com/

Thanks,
Calvin

> Cc: stable@...r.kernel.org
> Cc: jv@...sburgh.net
> Fixes: efa95b01da18 ("netpoll: fix use after free")
> Signed-off-by: Breno Leitao <leitao@...ian.org>
> ---
>  net/core/netpoll.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 5f65b62346d4e..19676cd379640 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -815,6 +815,10 @@ static void __netpoll_cleanup(struct netpoll *np)
>  	if (!npinfo)
>  		return;
>  
> +	/* At this point, there is a single npinfo instance per netdevice, and
> +	 * its refcnt tracks how many netpoll structures are linked to it. We
> +	 * only perform npinfo cleanup when the refcnt decrements to zero.
> +	 */
>  	if (refcount_dec_and_test(&npinfo->refcnt)) {
>  		const struct net_device_ops *ops;
>  
> @@ -824,8 +828,7 @@ static void __netpoll_cleanup(struct netpoll *np)
>  
>  		RCU_INIT_POINTER(np->dev->npinfo, NULL);
>  		call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
> -	} else
> -		RCU_INIT_POINTER(np->dev->npinfo, NULL);
> +	}
>  
>  	skb_pool_flush(np);
>  }
> 
> -- 
> 2.47.3
>