lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 15 Jul 2015 19:49:11 +0200
From:	Nikolay Aleksandrov <nikolay@...ulusnetworks.com>
To:	clsoto@...ux.vnet.ibm.com, davem@...emloft.net
CC:	netdev@...r.kernel.org, brking@...ux.vnet.ibm.com,
	j.vosburgh@...il.com, gospo@...ulusnetworks.com
Subject: Re: [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one

On 07/13/2015 11:10 PM, Nikolay Aleksandrov wrote:
> On 07/13/2015 11:05 PM, Nikolay Aleksandrov wrote:
>> On 07/13/2015 08:57 PM, clsoto@...ux.vnet.ibm.com wrote:
>>> From: Carol L Soto <clsoto@...ux.vnet.ibm.com>
>>>
>>> Add function bond_remove_proc_entry at __bond_release_one to avoid stack 
>>> trace at rmmod bonding.
>>>
>>> [68830.202239] remove_proc_entry: removing non-empty directory
>>> 'net/bonding', leaking at least 'bond0'
>>> [68830.202257] ------------[ cut here ]------------
>>> [68830.202260] WARNING: at fs/proc/generic.c:562
>>> [68830.202412] NIP [c0000000002abf6c] .remove_proc_entry+0x1fc/0x240
>>> [68830.202416] LR [c0000000002abf68] .remove_proc_entry+0x1f8/0x240
>>> [68830.202419] PACATMSCRATCH [8000000000009032]
>>> [68830.202421] Call Trace:
>>> [68830.202424] [c000000179277940] [c0000000002abf68] 
>>> .remove_proc_entry+0x1f8/0x240 (unreliable)
>>> [68830.202434] [c0000001792779f0] [d0000000053229a4] 
>>> .bond_destroy_proc_dir+0x34/0x54 [bonding]
>>> [68830.202440] [c000000179277a70] [d0000000053130e0] 
>>> .bond_net_exit+0x90/0x120 [bonding]
>>> [68830.202445] [c000000179277b10] [c00000000059944c] 
>>> .ops_exit_list.isra.0+0x6c/0xd0
>>> [68830.202450] [c000000179277ba0] [c000000000599774] 
>>> .unregister_pernet_operations+0x94/0x100
>>> [68830.202454] [c000000179277c40] [c000000000599814] 
>>> .unregister_pernet_subsys+0x34/0x60
>>> [68830.202460] [c000000179277cc0] [d000000005323758] 
>>> .bonding_exit+0x48/0x2328 [bonding]
>>> [68830.202466] [c000000179277d30] [c00000000010dcc4] 
>>> .SyS_delete_module+0x1f4/0x340
>>> [68830.202471] [c000000179277e30] [c000000000009e7c] 
>>> syscall_exit+0x0/0x7c
>>> [68830.202491] ---[ end trace 9bd1d810219c9875 ]---
>>>
>>> Signed-off-by: Carol L Soto <clsoto@...ux.vnet.ibm.com>
>>> ---
>>>  drivers/net/bonding/bond_main.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 19eb990..ace105a 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device *bond_dev,
>>>  		dev_set_mac_address(slave_dev, &addr);
>>>  	}
>>>  
>>> +	bond_remove_proc_entry(bond);
>>> +
>>>  	dev_set_mtu(slave_dev, slave->original_mtu);
>>>  
>>>  	slave_dev->priv_flags &= ~IFF_BONDING;
>>>
>>
>> This is incorrect, it tries to remove the bond entry on every slave release
>> so if we have a bonding device with >= 2 slaves and release one of them then
>> the whole bond device entry will be removed from /proc/net/bonding.
> <<<<>>>>
>> You can hit this case only if you had created a bonding device while doing the
>> rmmod bonding (it's an old race condition which was fixed long time ago, but
>> the procfs was apparently missed) and only after the notifier has been
>> unregistered but before the sysfs has been removed.
>>
> Scratch this part, it should be triggered in a different way.
> Could you provide a way to reproduce ?
> 
>> Since the bonding netdevice notifier is handling the procfs creation/destruction
>> we could try moving the unregister after the pernet destruction which should
>> help avoid such problems. Could you try the following patch:
>>
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 19eb990d398c..d515ee38b77f 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -4682,12 +4682,10 @@ err_link:
>>  
>>  static void __exit bonding_exit(void)
>>  {
>> -	unregister_netdevice_notifier(&bond_netdev_notifier);
>> -
>>  	bond_destroy_debugfs();
>> -
>>  	bond_netlink_fini();
>>  	unregister_pernet_subsys(&bond_net_ops);
>> +	unregister_netdevice_notifier(&bond_netdev_notifier);
>>  
>>  #ifdef CONFIG_NET_POLL_CONTROLLER
>>  	/* Make sure we don't have an imbalance on our netpoll blocking */
>>

After we had a private discussion I was able to reproduce this issue with
tap devices (!= ARPHRD_ETHER and can be enslaved):

[14446.539000] bond0: Releasing active interface tun1
[14446.548333] bond0 (unregistering): Released all slaves
[14446.564200] ------------[ cut here ]------------
[14446.564208] WARNING: CPU: 0 PID: 6319 at fs/proc/generic.c:575 remove_proc_entry+0x112/0x160()
[14446.564211] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0'
[14446.564212] Modules linked in: tun bonding(-) eql(O) 9p stp llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev hid_generic usbhid hid snd_hda_codec_generic ppdev aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse pcspkr snd_hda_intel evdev snd_hda_codec snd_hwdep qxl snd_hda_core serio_raw snd_pcm snd_timer snd 9pnet_virtio 9pnet virtio_balloon soundcore i2c_piix4 drm_kms_helper ttm drm i2c_core virtio_console pvpanic parport_pc parport acpi_cpufreq processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 sg sr_mod cdrom ata_generic virtio_blk virtio_net e1000 ata_piix floppy ehci_pci uhci_hcd ehci_hcd libata scsi_mod usbcore usb_common virtio_pci virtio_ring virtio [last unloaded: bridge]

[14446.564295] CPU: 0 PID: 6319 Comm: rmmod Tainted: G           O    4.2.0-rc2+ #6
[14446.564296] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[14446.564299]  0000000000000000 ffffffff81732d41 ffffffff81525b34 ffff880035bcbda8
[14446.564302]  ffffffff8106c521 ffff8800367c5f78 ffff8800367c5f40 ffff88003e3a4280
[14446.564304]  ffffffffa05a5040 0000000000000000 ffffffff8106c59a ffffffff8172ebd0
[14446.564307] Call Trace:
[14446.564313]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
[14446.564317]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
[14446.564320]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
[14446.564323]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
[14446.564329]  [<ffffffffa059d0d6>] ? bond_destroy_proc_dir+0x26/0x30 [bonding]
[14446.564332]  [<ffffffffa058d40e>] ? bond_net_exit+0x8e/0xa0 [bonding]
[14446.564336]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
[14446.564340]  [<ffffffff8142f52d>] ? unregister_pernet_operations+0x8d/0xd0
[14446.564343]  [<ffffffff8142f58d>] ? unregister_pernet_subsys+0x1d/0x30
[14446.564346]  [<ffffffffa059d259>] ? bonding_exit+0x23/0xdca [bonding]
[14446.564350]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
[14446.564354]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
[14446.564357]  [<ffffffff8152b732>] ? entry_SYSCALL_64_fastpath+0x16/0x75
[14446.564360] ---[ end trace a911dbcedf315986 ]---

The problem is in bond_release_and_destroy() and I'll post a proper fix
for -net in a few minutes after I run some tests.

Cheers,
 Nik
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ