lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5ca349f4-dabe-48d4-8c52-1b02c7650104@nvidia.com>
Date: Wed, 23 Apr 2025 14:47:57 +0300
From: Yael Chemla <ychemla@...dia.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: davem@...emloft.net, edumazet@...gle.com, horms@...nel.org,
 kuba@...nel.org, kuni1840@...il.com, netdev@...r.kernel.org,
 pabeni@...hat.com
Subject: Re: [PATCH v5 net 2/3] net: Fix dev_net(dev) race in
 unregister_netdevice_notifier_dev_net().

On 06/04/2025 18:37, Yael Chemla wrote:
> On 02/04/2025 0:58, Kuniyuki Iwashima wrote:
>> Hi Yael,
>>
>> Thanks for testing!
>>
>> From: Yael Chemla <ychemla@...dia.com>
>> Date: Tue, 1 Apr 2025 23:49:42 +0300
>>> Hi Kuniyuki,
>>> Sorry for the delay (I was OOO). I tested your patch, and while the race
>>> occurs much less frequently, it still happens—see the warnings and call
>>> traces below.
>>> Additionally, in some cases, the test which reproduce the race hang.
>>> Debugging shows that we're stuck in an endless loop inside
>>> rtnl_net_dev_lock because the passive refcount is already zero, causing
>>> net_passive_inc_not_zero to return false, thus it go to "again" and this
>>> repeats without ending.
>>> I suspect, as you mentioned before, that in such cases, the passive
>>> refcount was decreased from cleanup_net.
>>
>> This sounds weird.
>>
>> We assumed vif will be moved to init_net, then the infinite loop
>> should never happen.
>>
>> So the assumption was wrong and vif belonged to the dead netns and
>> was not moved to init_net ... ??
>>
>> Even if dev_change_net_namespace() fails, it leads to BUG().
>>
> 
> Hi Kuniyuki,
> In failure scenarios, we observe that cleanup_net is invoked, followed
> by net_passive_dec, which reduces the passive refcount to zero. These
> are called before the call to unregister_netdevice_notifier_dev_net.
> 
> During the test, dev_change_net_namespace is called once, but it
> operates on different net_device poiner than the one passed to final
> call of unregister_netdevice_notifier_dev_net, a call which enter
> infinite loop (with net->passive=0 and net->ns.count=0, inside
> rtnl_net_dev_lock, as explained in previous mail).
> 
> Do you need additional debug information, perhaps specific details
> regarding reassigning the netns to init_net? Please let me know how I
> can help further.
> 

Hi Kuniyuki,
any updates on this?
thanks,
Yael

>>>
>>>
>>> warnings and call traces:
>>>
>>> refcount_t: addition on 0; use-after-free.
>>
>> I guess this is from the old log or the test patch was not applied
>> because _inc_not_zero() will trigger REFCOUNT_ADD_NOT_ZERO_OVF and
>> then the message will be
>>
>>   refcount_t: saturated; leaking memory
>>
>> , see __refcount_add_not_zero() and refcount_warn_saturate().
>>
> 
> you are right it's a mistake, i was unable to reproduce another failure
> with call trace info. Test either succeeds or hang (infinite loop).
> 
>>
>>> WARNING: CPU: 4 PID: 27219 at lib/refcount.c:25 refcount_warn_saturate
>>> (/usr/work/linux/lib/refcount.c:25 (discriminator 1))
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ