lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2930648.1757463506@famine>
Date: Tue, 09 Sep 2025 17:18:26 -0700
From: Jay Vosburgh <jv@...sburgh.net>
To: Jakub Kicinski <kuba@...nel.org>
cc: Calvin Owens <calvin@...nvd.org>, Breno Leitao <leitao@...ian.org>,
    Andrew Lunn <andrew+netdev@...n.ch>,
    "David S. Miller" <davem@...emloft.net>,
    Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
    Shuah Khan <shuah@...nel.org>, Simon Horman <horms@...nel.org>,
    david decotigny <decot@...glers.com>, linux-kernel@...r.kernel.org,
    netdev@...r.kernel.org, linux-kselftest@...r.kernel.org,
    asantostc@...il.com, efault@....de, kernel-team@...a.com,
    stable@...r.kernel.org
Subject: Re: [PATCH net v3 1/3] netpoll: fix incorrect refcount handling
 causing incorrect cleanup

Jakub Kicinski <kuba@...nel.org> wrote:

>On Mon, 8 Sep 2025 13:47:24 -0700 Calvin Owens wrote:
>> I wonder if there might be a demon lurking in bonding+netpoll that this
>> was papering over? Not a reason not to fix the leaks IMO, I'm just
>> curious, I don't want to spend time on it if you already did :)
>
>+1, I also feel like it'd be good to have some bonding tests in place
>when we're removing a hack added specifically for bonding.

	I'll disclaimer this by saying up front that I'm not super
familiar with the innards of netpoll.

	That said, I looked at commit efa95b01da18 ("netpoll: fix use
after free") and the relevant upstream discussion, and I'm not sure the
assertion that "After a bonding master reclaims the netpoll info struct,
slaves could still hold a pointer to the reclaimed data" is correct.

	I'm not sure the efa9 patch's reference count math is
correct (more on that below).

	Second, I'm a bit unsure what's going on with the struct netpoll
*np parameter of __netpoll_setup for the second and subsequent netpoll
instances (i.e., second and later call), as the function will
unconditionally do

	npinfo->netpoll = np;

	which it seems like would overwrite the "np" supplied by any
prior calls to __netpoll_setup.  In bonding, slave_enable_netpoll()
stashes the "np" it allocates as slave->np, and slave_disable_netpoll
relies on __netpoll_free to free it, so I don't think it's lost, but it
seems like netpoll internally only tracks one of these at a time,
regardless of the reference count.

	On the reference counting, the upstream example from the prior
discussion includes:

    mkdir /sys/kernel/config/netconsole/blah
    echo 0 > /sys/kernel/config/netconsole/blah/enabled
    echo bond0 > /sys/kernel/config/netconsole/blah/dev_name
    echo 192.168.56.42 > /sys/kernel/config/netconsole/blah/remote_ip
    echo 1 > /sys/kernel/config/netconsole/blah/enabled
    # npinfo refcnt ->1
    ifenslave bond0 eth1
    # npinfo refcnt ->2
    ifenslave bond0 eth0
    # (this should be optional, preventing ndo_cleanup_nepoll below)
    # npinfo refcnt ->3

	I'm suspicious of the refcnt values here; both then and now, the
npinfo for each of the relevant interfaces is a separate per-interface
allocation in __netpoll_setup, so I'm not sure what exactly is supposed
to be getting a refcnt of 3.

	If there are two netpoll instances using the slave in question
(either directly or via the bond itself), then clearing the
np->dev->npinfo pointer looks like the wrong thing to do until the last
reference is released.

	-J

---
	-Jay Vosburgh, jv@...sburgh.net

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ