[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e32387f1-fcf6-41fd-89a9-0b54637c99c9@intel.com>
Date: Thu, 16 Jan 2025 14:59:01 +0100
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Antoine Tenart <atenart@...nel.org>
CC: <ecree.xilinx@...il.com>, <netdev@...r.kernel.org>, <davem@...emloft.net>,
<kuba@...nel.org>, <pabeni@...hat.com>, <edumazet@...gle.com>
Subject: Re: [PATCH net v2] net: avoid race between device unregistration and
ethnl ops
On 1/16/25 14:47, Antoine Tenart wrote:
> Quoting Przemek Kitszel (2025-01-16 10:44:40)
>> On 1/16/25 10:21, Antoine Tenart wrote:
>>> The following trace can be seen if a device is being unregistered while
>>> its number of channels are being modified.
>>>
>>> DEBUG_LOCKS_WARN_ON(lock->magic != lock)
>>> WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120
>>> CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771
>>> RIP: 0010:__mutex_lock+0xc8a/0x1120
>>> Call Trace:
>>> <TASK>
>>> ethtool_check_max_channel+0x1ea/0x880
>>> ethnl_set_channels+0x3c3/0xb10
>>> ethnl_default_set_doit+0x306/0x650
>>> genl_family_rcv_msg_doit+0x1e3/0x2c0
>>> genl_rcv_msg+0x432/0x6f0
>>> netlink_rcv_skb+0x13d/0x3b0
>>> genl_rcv+0x28/0x40
>>> netlink_unicast+0x42e/0x720
>>> netlink_sendmsg+0x765/0xc20
>>> __sys_sendto+0x3ac/0x420
>>> __x64_sys_sendto+0xe0/0x1c0
>>> do_syscall_64+0x95/0x180
>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> This is because unregister_netdevice_many_notify might run before the
>>> rtnl lock section of ethnl operations, eg. set_channels in the above
>>> example. In this example the rss lock would be destroyed by the device
>>> unregistration path before being used again, but in general running
>>> ethnl operations while dismantle has started is not a good idea.
>>>
>>> Fix this by denying any operation on devices being unregistered. A check
>>> was already there in ethnl_ops_begin, but not wide enough.
>>>
>>> Note that the same issue cannot be seen on the ioctl version
>>> (__dev_ethtool) because the device reference is retrieved from within
>>> the rtnl lock section there. Once dismantle started, the net device is
>>> unlisted and no reference will be found.
>>>
>>> Fixes: dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered")
>>> Signed-off-by: Antoine Tenart <atenart@...nel.org>
>>> ---
>>
>> for future submissions, please add a changelog and a link to previous
>> revisions
>
> This one was a bit special as v2 is completely different from v1, not
> much to describe. But sure, at least a link could help.
>
>>> net/ethtool/netlink.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
>>> index e3f0ef6b851b..4d18dc29b304 100644
>>> --- a/net/ethtool/netlink.c
>>> +++ b/net/ethtool/netlink.c
>>> @@ -90,7 +90,7 @@ int ethnl_ops_begin(struct net_device *dev)
>>> pm_runtime_get_sync(dev->dev.parent);
>>>
>>> if (!netif_device_present(dev) ||
>>> - dev->reg_state == NETREG_UNREGISTERING) {
>>> + dev->reg_state >= NETREG_UNREGISTERING) {
>>
>> looks good, but I would add a comment above enum netdev_reg_state
>> definition, to avoid any new state added "at the end"
with your interest of more improvements in the area, current patch is
fine for me as a fix, so:
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@...el.com>
>>
>> what about NETREG_DUMMY? you want to cover it here too?
>
> I'm not super familiar with NETREG_DUMMY but my understanding is those
> devices aren't listed and aren't accessible through ethnl.
>
> Having said that I do agree the checks on reg_state could be
> consolidated, eg. reusing and improving dev_isalive(). I actually
> planned to have a look at if this would make sense later on.
>
> tl;dr; I don't think there's an issue in practice but we could probably
> consolidate the code to make things easier to maintain and to read.
agree
>
> Thanks,
> Antoine
Powered by blists - more mailing lists