lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45fc873b-9b71-adf2-8f2f-17134344e490@blackwall.org>
Date:   Mon, 13 Mar 2023 12:52:44 +0200
From:   Nikolay Aleksandrov <razor@...ckwall.org>
To:     Shigeru Yoshida <syoshida@...hat.com>
Cc:     j.vosburgh@...il.com, andy@...yhouse.net, davem@...emloft.net,
        edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        Nikolay Aleksandrov <nikolay@...ulusnetworks.com>,
        syzbot+9dfc3f3348729cc82277@...kaller.appspotmail.com
Subject: Re: [PATCH net] bonding: Fix warning in default_device_exit_batch()

On 13/03/2023 11:35, Shigeru Yoshida wrote:
> Hi Nik,
> 
> On Sun, Mar 12, 2023 at 10:58:18PM +0200, Nikolay Aleksandrov wrote:
>> On 12/03/2023 17:21, Shigeru Yoshida wrote:
>>> syzbot reported warning in default_device_exit_batch() like below [1]:
>>>
>>> WARNING: CPU: 1 PID: 56 at net/core/dev.c:10867 unregister_netdevice_many_notify+0x14cf/0x19f0 net/core/dev.c:10867
>>> ...
>>> Call Trace:
>>>  <TASK>
>>>  unregister_netdevice_many net/core/dev.c:10897 [inline]
>>>  default_device_exit_batch+0x451/0x5b0 net/core/dev.c:11350
>>>  ops_exit_list+0x125/0x170 net/core/net_namespace.c:174
>>>  cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613
>>>  process_one_work+0x9bf/0x1820 kernel/workqueue.c:2390
>>>  worker_thread+0x669/0x1090 kernel/workqueue.c:2537
>>>  kthread+0x2e8/0x3a0 kernel/kthread.c:376
>>>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
>>>  </TASK>
>>>
>>> For bond devices which also has a master device, IFF_SLAVE flag is
>>> cleared at err_undo_flags label in bond_enslave() if it is not
>>> ARPHRD_ETHER type.  In this case, __bond_release_one() is not called
>>> when bond_netdev_event() received NETDEV_UNREGISTER event.  This
>>> causes the above warning.
>>>
>>> This patch fixes this issue by setting IFF_SLAVE flag at
>>> err_undo_flags label in bond_enslave() if the bond device has a master
>>> device.
>>>
>>
>> The proper way is to check if the bond device had the IFF_SLAVE flag before the
>> ether_setup() call which clears it, and restore it after.
>>
>>> Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure")
>>> Cc: Nikolay Aleksandrov <nikolay@...ulusnetworks.com>
>>> Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef [1]
>>> Reported-by: syzbot+9dfc3f3348729cc82277@...kaller.appspotmail.com
>>> Signed-off-by: Shigeru Yoshida <syoshida@...hat.com>
>>> ---
>>>  drivers/net/bonding/bond_main.c | 2 ++
>>>  include/net/bonding.h           | 5 +++++
>>>  2 files changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 00646aa315c3..1a8b59e1468d 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -2291,6 +2291,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>>>  			dev_close(bond_dev);
>>>  			ether_setup(bond_dev);
>>>  			bond_dev->flags |= IFF_MASTER;
>>> +			if (bond_has_master(bond))
>>> +				bond_dev->flags |= IFF_SLAVE;
>>>  			bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING;
>>>  		}
>>>  	}
>>> diff --git a/include/net/bonding.h b/include/net/bonding.h
>>> index ea36ab7f9e72..ed0b49501fad 100644
>>> --- a/include/net/bonding.h
>>> +++ b/include/net/bonding.h
>>> @@ -57,6 +57,11 @@
>>>  
>>>  #define bond_has_slaves(bond) !list_empty(bond_slave_list(bond))
>>>  
>>> +/* master list primitives */
>>> +#define bond_master_list(bond) (&(bond)->dev->adj_list.upper)
>>> +
>>> +#define bond_has_master(bond) !list_empty(bond_master_list(bond))
>>> +
>>
>> This is not the proper way to check for a master device.
>>
>>>  /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */
>>>  #define bond_first_slave(bond) \
>>>  	(bond_has_slaves(bond) ? \
>>
>> The device flags are wrong because of ether_setup() which clears IFF_SLAVE, we should
>> just check if it was present before and restore it after the ether_setup() call.
> 
> Thank you so much for your comment!  I understand your point, and
> agree that your approach must resolve the issue.
> 
> BTW, do you mean there is a case where a device has IFF_SLAVE flag but
> the upper list is empty?  I thought a device with IFF_SLAVE flag has a
> master device in the upper list (that is why I took the above way.)
> 

Hi Shigeru,
No, that's not what I meant. It's the opposite actually, you may have an upper list
but you don't have a "master" device or slave flag set. Yes, you can say that if
a device has IFF_SLAVE set, then it must have a master upper device but that's not
what you're checking for, you've reversed that logic to check for an upper device instead
and assume there's a IFF_SLAVE flag set (which may not be true).
For an upper device to be considered a "master" device, it must have the master bool set to
true in its netdev_adjacent structure. We already have helpers to check for master devices
and to retrieve them, e.g. check netdev_master_upper_dev_get* in net/core/dev.c

The most robust way to fix it is to check if the flag was there prior to the ether_setup() call
and restore it after, also to leave a nice comment about all of this. :)

> Thanks,
> Shigeru
> 

Cheers,
 Nik

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ