lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A44D1FC.8090001@onet.eu>
Date:	Fri, 26 Jun 2009 15:49:48 +0200
From:	sdrb <sdrb@...t.eu>
To:	Jarek Poplawski <jarkao2@...il.com>
CC:	netdev@...r.kernel.org
Subject: Re: hunging ifenslave command

Jarek Poplawski pisze:
> sdrb wrote, On 06/18/2009 03:15 PM:
> 
>> Hello,
>>
>> I have got problem with hunging "ifenslave" command.
>> I configured bond0 interfaces with 3 slaved interfaces: eth0, eth1 and 
>> eth2. While I'm removing one of it - sometimes only the "ifenslave" 
>> command hangs up but sometimes the whole system is hanging up completely 
>> - so it's not possible to even write on the console.
>>
>> I'm using linux kernel 2.6.27.10 with bonding driver version v3.3.0 
>> (June 10, 2008) and ethernet card driver r8168 version 8.006.00-NAPI.
>>
>> Anyone knows where is the problem with it?
> 
> 
> Hi,
> 
> I don't know, but I guess, if anyone knew it would be fixed now. So, I'd
> recommend trying the current stable (2.6.30), and if no difference, maybe
> some debugging like turning on lockdep (lock debugging with prove
> locking correctness). If still nothing reported, try to get a few SysRq
> logs when it happens e.g. Alt-PrtScr with t, d, w, q, and send them with
> .config and dmesg (gzipped or as attachments to the bugzilla report).

Ok, I dig a little in the 2.6.27.10 kernel and I've taken the newest 
driver (ver 8.012.00) from the realtek website.
Sorry - I haven't tested it under 2.6.30, because I had to fix it just 
for 2.6.27.10.

I investigated this problem and I noticed that probably there is problem 
with rtnl_lock().
Below there is backtrace for three tasks I've got from logs:


<6>SysRq : Show Blocked State
<6>  task                        PC stack   pid father
<6>events/2      D ffff88003e155d50     0    13      2
<0> ffff88003e155d20 0000000000000046 0000000000000000 ffff88003e2fe15d
<0> 0000000000000001 ffff88003e0c6140 ffff88003e155cb8 00000001000e5496
<0> ffff88003e150430 ffff88003e150200 0000000000000001 0000000000000000
<0>Call Trace:
<0> [<ffffffff806cddf5>] mutex_lock_nested+0xe5/0x290
<0> [<ffffffff806204d2>] ? rtnl_lock+0x12/0x20
<0> [<ffffffff8025d28d>] ? trace_hardirqs_on+0xd/0x10
<0> [<ffffffff80623060>] ? linkwatch_event+0x0/0x40
<0> [<ffffffff806204d2>] rtnl_lock+0x12/0x20
<0> [<ffffffff8062306d>] linkwatch_event+0xd/0x40
<0> [<ffffffff80249c39>] ? run_workqueue+0x19/0x210
<0> [<ffffffff80249d07>] run_workqueue+0xe7/0x210
<0> [<ffffffff80249cb4>] ? run_workqueue+0x94/0x210
<0> [<ffffffff8025d28d>] ? trace_hardirqs_on+0xd/0x10
<0> [<ffffffff80249ecc>] worker_thread+0x9c/0xf0
<0> [<ffffffff8024e180>] ? autoremove_wake_function+0x0/0x40
<0> [<ffffffff8025d28d>] ? trace_hardirqs_on+0xd/0x10
<0> [<ffffffff8024e180>] ? autoremove_wake_function+0x0/0x40
<0> [<ffffffff80249e30>] ? worker_thread+0x0/0xf0
<0> [<ffffffff8024d9f8>] kthread+0x68/0xa0
<0> [<ffffffff8020d3b9>] child_rip+0xa/0x11
<0> [<ffffffff8020c9ef>] ? restore_args+0x0/0x30
<0> [<ffffffff8024d990>] ? kthread+0x0/0xa0
<0> [<ffffffff8020d3af>] ? child_rip+0x0/0x11
<0>
<6>snmpd         D ffff88003e477c68     0 10287      1
<0> ffff88003e477c38 0000000000200046 0000000000000000 ffff88003e1e3160
<0> ffffffff80231d50 ffff88003e122fa0 ffff88003e477bd0 00000001000e556a
<0> ffff88003e1e3390 ffff88003e1e3160 000000003e1e3160 0000000000000000
<0>Call Trace:
<0> [<ffffffff80231d50>] ? default_wake_function+0x0/0x10
<0> [<ffffffff806cddf5>] mutex_lock_nested+0xe5/0x290
<0> [<ffffffff806204d2>] ? rtnl_lock+0x12/0x20
<0> [<ffffffff806204d2>] rtnl_lock+0x12/0x20
<0> [<ffffffff806186f0>] dev_ioctl+0x1b0/0x540
<0> [<ffffffff80607f08>] sock_ioctl+0x128/0x250
<0> [<ffffffff802b4d22>] vfs_ioctl+0xa2/0xc0
<0> [<ffffffff802b4dcb>] do_vfs_ioctl+0x8b/0x2d0
<0> [<ffffffff802b5092>] sys_ioctl+0x82/0xa0
<0> [<ffffffff802e105f>] dev_ifconf+0xef/0x230
<0> [<ffffffff802e33d9>] compat_sys_ioctl+0x2e9/0x3e0
<0> [<ffffffff806cf87d>] ? lockdep_sys_exit_thunk+0x35/0x67
<0> [<ffffffff806cf807>] ? trace_hardirqs_on_thunk+0x3a/0x3f
<0> [<ffffffff80229f52>] ia32_sysret+0x0/0xa
<0>
<6>ifenslave     D ffff880027425a50     0 14957  14950
<0> ffff880027425908 0000000000000046 0000000000000000 ffff8800010eeb80
<0> ffff8800010eeb80 ffff88003e0c6140 ffff8800274258a0 00000001000e54a3
<0> ffff88002f69c430 ffff88002f69c200 00000000010eec18 0000000000000000
<0>Call Trace:
<0> [<ffffffff8022f990>] ? finish_task_switch+0x0/0xe0
<0> [<ffffffff806cda06>] schedule_timeout+0xb6/0xc0
<0> [<ffffffff8025d28d>] ? trace_hardirqs_on+0xd/0x10
<0> [<ffffffff806cffeb>] ? _spin_unlock_irq+0x2b/0x40
<0> [<ffffffff806cd52c>] wait_for_common+0xcc/0x1a0
<0> [<ffffffff80231d50>] ? default_wake_function+0x0/0x10
<0> [<ffffffff80231e2e>] ? __wake_up+0x4e/0x70
<0> [<ffffffff80231d50>] ? default_wake_function+0x0/0x10
<0> [<ffffffff806cd618>] wait_for_completion+0x18/0x20
<0> [<ffffffff8024a04b>] flush_cpu_workqueue+0x8b/0xb0
<0> [<ffffffff80249f20>] ? wq_barrier_func+0x0/0x10
<0> [<ffffffff8024a0da>] flush_workqueue+0x6a/0x90
<0> [<ffffffff8024a070>] ? flush_workqueue+0x0/0x90
<0> [<ffffffff8024a590>] flush_scheduled_work+0x10/0x20
<0> [<ffffffffa006e3b0>] rtl8168_down+0x60/0xf0 [r8168]
<0> [<ffffffffa006e46f>] rtl8168_close+0x2f/0xc0 [r8168]
<0> [<ffffffff8061512f>] dev_close+0x6f/0xa0
<0> [<ffffffffa0102fcd>] bond_release+0x21d/0x410 [bonding]
<0> [<ffffffff806cffb6>] ? _read_unlock+0x26/0x30
<0> [<ffffffffa0105fab>] bond_do_ioctl+0x4cb/0x540 [bonding]
<0> [<ffffffff806cdec8>] ? mutex_lock_nested+0x1b8/0x290
<0> [<ffffffff806204d2>] ? rtnl_lock+0x12/0x20
<0> [<ffffffff8061838a>] dev_ifsioc+0x12a/0x2e0
<0> [<ffffffff806186ca>] dev_ioctl+0x18a/0x540
<0> [<ffffffffa002387a>] ? aufs_fault+0x14a/0x310 [aufs]
<0> [<ffffffff80607f08>] sock_ioctl+0x128/0x250
<0> [<ffffffff802b4d22>] vfs_ioctl+0xa2/0xc0
<0> [<ffffffff802b4dcb>] do_vfs_ioctl+0x8b/0x2d0
<0> [<ffffffff802b5092>] sys_ioctl+0x82/0xa0
<0> [<ffffffff802e1362>] bond_ioctl+0x122/0x140
<0> [<ffffffff802e33d9>] compat_sys_ioctl+0x2e9/0x3e0
<0> [<ffffffff806cf87d>] ? lockdep_sys_exit_thunk+0x35/0x67
<0> [<ffffffff806cf807>] ? trace_hardirqs_on_thunk+0x3a/0x3f
<0> [<ffffffff80229f52>] ia32_sysret+0x0/0xa


I've made some patch for r8168 driver and it seems it works, but I'm not 
sure if I did it correctly or if it isn't too dangerous solution :)
The patch is in attachment. With this patch the "ifenslave" command 
doesn't hang as earlier.
Can anyone review it?


sdrb


View attachment "r8168_n.c.diff" of type "text/plain" (399 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ