lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 24 Jun 2007 22:24:30 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	netdev@...r.kernel.org
Cc:	"bugme-daemon@...nel-bugs.osdl.org" 
	<bugme-daemon@...nel-bugs.osdl.org>, ranko@...dernet.net
Subject: Re: [Bugme-new] [Bug 8668] New: HTB Deadlock

On Sun, 24 Jun 2007 21:57:19 -0700 (PDT) bugme-daemon@...zilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8668
> 
>            Summary: HTB Deadlock
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.19.7
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@...stprotocols.net
>         ReportedBy: ranko@...dernet.net
> 
> 
> Most recent kernel where this bug did not occur:
> Distribution:
> Hardware Environment:
> Software Environment:
> Problem Description:
> Greetings,
> 
> I've been experiencing problems with HTB where the whole machine locks
> up. This usually happens when the whole qdisc is being removed and
> occasionally when a leaf is being removed.
> 
> Common is that it always happens when some sort of removal is in
> progress.
> 
> Console output I have captured is at the end of this message. The same
> behavior exists from vanilla 2.6.19.7 and above. It is possible that the
> problem also exist in the earlier versions however I did not go further
> back.
> 
> I also believe I have found where the actual problem is:
> 
> qdisc_destroy() function is always called with dev->queue_lock locked.
> htb_destroy() function up the stack is using del_timer_sync() call to
> deactivate HTB qdisc timers. 

yep, I would agree with that analysis.  del_timer_sync() under a lock is
quite dangerous in this regard.

If the (misspelled) comment over htb_destroy() is true, current mainline
appears still to have this bug.


> >From the comments in the source where del_timer_sync() is defined:
> 
> ---copy/paste---
> /**
>  * del_timer_sync - deactivate a timer and wait for the handler to finish.
>  * @timer: the timer to be deactivated
>  *
>  * This function only differs from del_timer() on SMP: besides deactivating
>  * the timer it also makes sure the handler has finished executing on other
>  * CPUs.
>  *
>  * Synchronization rules: Callers must prevent restarting of the timer,
>  * otherwise this function is meaningless. It must not be called from
>  * interrupt contexts. The caller must not hold locks which would prevent
>  * completion of the timer's handler. The timer's handler must not call
>  * add_timer_on(). Upon exit the timer is not queued and the handler is
>  * not running on any CPU.
>  *
>  * The function returns whether it has deactivated a pending timer or not.
>  */
> ---copy/paste---
> 
> Now, htb_rate_timer() does exactly what appears to be the source of the
> problem - it tries obtain dev->queue_lock - and given the right moment
> (timer fired handler while qdisc_destroy was holding the lock) - system
> locks up - del_timer_sync is waiting for handler to finish while the
> handler is waiting for the dev->queue_lock.
> 
> Of course I could also be completely wrong here and missing something
> not so obvious.
> 
> I could also attempt to fix this but I haven't dealt with this code in
> the past so I was hoping someone with better insight might just have an
> elegant solution up his sleeve.
> 
> Best regards,
> 
> Ranko
> 
> PS: If this is not the right place for this report - please let me
> know.
> 
> -----------CONSOLE (2.6.19.7)-----------
> BUG: soft lockup detected on CPU#3!
>  [<c013c890>] softlockup_tick+0x93/0xc2
>  [<c0127585>] update_process_times+0x26/0x5c
>  [<c0111cd5>] smp_apic_timer_interrupt+0x97/0xb2
>  [<c0104373>] apic_timer_interrupt+0x1f/0x24
>  [<c02e007b>] klist_next+0x4/0x8a
>  [<c02e2570>] _spin_unlock_irqrestore+0xa/0xc
>  [<c012729b>] try_to_del_timer_sync+0x47/0x4f
>  [<c01272b1>] del_timer_sync+0xe/0x14
>  [<f8b8a85b>] htb_destroy+0x20/0x7b [sch_htb]
>  [<c028f196>] qdisc_destroy+0x44/0x8d
>  [<f8b89645>] htb_destroy_class+0xd0/0x12d [sch_htb]
>  [<f8b895c7>] htb_destroy_class+0x52/0x12d [sch_htb]
>  [<f8b8a87a>] htb_destroy+0x3f/0x7b [sch_htb]
>  [<c028f196>] qdisc_destroy+0x44/0x8d
>  [<f8b89645>] htb_destroy_class+0xd0/0x12d [sch_htb]
>  [<f8b895c7>] htb_destroy_class+0x52/0x12d [sch_htb]
>  [<f8b8a87a>] htb_destroy+0x3f/0x7b [sch_htb]
>  [<c028f196>] qdisc_destroy+0x44/0x8d
>  [<c0290ba9>] tc_get_qdisc+0x1a3/0x1ef
>  [<c0290a06>] tc_get_qdisc+0x0/0x1ef
>  [<c028a366>] rtnetlink_rcv_msg+0x158/0x215
>  [<c028a20e>] rtnetlink_rcv_msg+0x0/0x215
>  [<c0294598>] netlink_run_queue+0x88/0x11d
>  [<c028a1c0>] rtnetlink_rcv+0x26/0x42
>  [<c0294b0c>] netlink_data_ready+0x12/0x54
>  [<c0293843>] netlink_sendskb+0x1c/0x33
>  [<c0294a11>] netlink_sendmsg+0x1ee/0x2d7
>  [<c0278ff7>] sock_sendmsg+0xe5/0x100
>  [<c01306b9>] autoremove_wake_function+0x0/0x37
>  [<c01306b9>] autoremove_wake_function+0x0/0x37
>  [<c0278ff7>] sock_sendmsg+0xe5/0x100
>  [<c01cd8be>] copy_from_user+0x33/0x69
>  [<c027913f>] sys_sendmsg+0x12d/0x243
>  [<c02e2564>] _read_unlock_irq+0x5/0x7
>  [<c013fb2b>] find_get_page+0x37/0x42
>  [<c01423dd>] filemap_nopage+0x30c/0x3a3
>  [<c014bb99>] __handle_mm_fault+0x21c/0x943
>  [<c02e24c5>] _spin_unlock_bh+0x5/0xd
>  [<c027b475>] sock_setsockopt+0x63/0x59d
>  [<c0151801>] anon_vma_prepare+0x1b/0xcb
>  [<c027a2ea>] sys_socketcall+0x24f/0x271
>  [<c02e3ad0>] do_page_fault+0x0/0x600
>  [<c01038f1>] sysenter_past_esp+0x56/0x79
>  =======================
> BUG: soft lockup detected on CPU#1!
>  [<c013c890>] softlockup_tick+0x93/0xc2
>  [<c0127585>] update_process_times+0x26/0x5c
>  [<c0111cd5>] smp_apic_timer_interrupt+0x97/0xb2
>  [<c0104373>] apic_timer_interrupt+0x1f/0x24
>  [<c01c007b>] blk_do_ordered+0x70/0x27e
>  [<c01ce788>] _raw_spin_lock+0xaa/0x13e
>  [<f8b8b422>] htb_rate_timer+0x18/0xc4 [sch_htb]
>  [<c0127539>] run_timer_softirq+0x163/0x189
>  [<f8b8b40a>] htb_rate_timer+0x0/0xc4 [sch_htb]
>  [<c0123315>] __do_softirq+0x70/0xdb
>  [<c01233bb>] do_softirq+0x3b/0x42
>  [<c0111cda>] smp_apic_timer_interrupt+0x9c/0xb2
>  [<c0104373>] apic_timer_interrupt+0x1f/0x24
>  [<c0101cc3>] mwait_idle_with_hints+0x3b/0x3f
>  [<c0101cd3>] mwait_idle+0xc/0x1b
>  [<c010271c>] cpu_idle+0x63/0x79
>  =======================
> BUG: soft lockup detected on CPU#2!
>  [<c013c890>] softlockup_tick+0x93/0xc2
>  [<c0127585>] update_process_times+0x26/0x5c
>  [<c0111cd5>] smp_apic_timer_interrupt+0x97/0xb2
>  [<c0104373>] apic_timer_interrupt+0x1f/0x24
>  [<c01c007b>] blk_do_ordered+0x70/0x27e
>  [<c01ce788>] _raw_spin_lock+0xaa/0x13e
>  [<c02846df>] dev_queue_xmit+0x53/0x2e4
>  [<c0286e20>] neigh_connected_output+0x80/0xa0
>  [<c02a213a>] ip_output+0x1b5/0x24b
>  [<c02a0b56>] ip_finish_output+0x0/0x192
>  [<c029dfef>] ip_forward+0x1c8/0x2b9
>  [<c029ddf0>] ip_forward_finish+0x0/0x37
>  [<c029c962>] ip_rcv+0x2a5/0x538
>  [<c029c100>] ip_rcv_finish+0x0/0x2aa
>  [<c027f3bc>] __netdev_alloc_skb+0x12/0x2a
>  [<c029c6bd>] ip_rcv+0x0/0x538
>  [<c0282a1e>] netif_receive_skb+0x218/0x318
>  [<c0270008>] bitmap_get_counter+0x41/0x1e6
>  [<f8a6146d>] e1000_clean_rx_irq+0x12c/0x4ef [e1000]
>  [<f8a61341>] e1000_clean_rx_irq+0x0/0x4ef [e1000]
>  [<f8a60612>] e1000_clean+0xe5/0x130 [e1000]
>  [<c0284573>] net_rx_action+0xbc/0x1d5
>  [<c0123315>] __do_softirq+0x70/0xdb
>  [<c01233bb>] do_softirq+0x3b/0x42
>  [<c01058c2>] do_IRQ+0x6c/0xda
>  [<c01042e2>] common_interrupt+0x1a/0x20
>  [<c0101cc3>] mwait_idle_with_hints+0x3b/0x3f
>  [<c0101cd3>] mwait_idle+0xc/0x1b
>  [<c010271c>] cpu_idle+0x63/0x79
>  =======================
> BUG: soft lockup detected on CPU#0!
>  [<c013c890>] softlockup_tick+0x93/0xc2
>  [<c0127585>] update_process_times+0x26/0x5c
>  [<c0111cd5>] smp_apic_timer_interrupt+0x97/0xb2
>  [<c0104373>] apic_timer_interrupt+0x1f/0x24
>  [<c01cd2eb>] delay_tsc+0x7/0x13
>  [<c01cd323>] __delay+0x6/0x7
>  [<c01ce796>] _raw_spin_lock+0xb8/0x13e
>  [<c02846df>] dev_queue_xmit+0x53/0x2e4
>  [<c0286e20>] neigh_connected_output+0x80/0xa0
>  [<c02a213a>] ip_output+0x1b5/0x24b
>  [<c02a0b56>] ip_finish_output+0x0/0x192
>  [<c029dfef>] ip_forward+0x1c8/0x2b9
>  [<c029ddf0>] ip_forward_finish+0x0/0x37
>  [<c029c962>] ip_rcv+0x2a5/0x538
>  [<c029c100>] ip_rcv_finish+0x0/0x2aa
>  [<c027e774>] __alloc_skb+0x47/0xf3
>  [<c029c6bd>] ip_rcv+0x0/0x538
>  [<c0282a1e>] netif_receive_skb+0x218/0x318
>  [<c0270008>] bitmap_get_counter+0x41/0x1e6
>  [<f88fac1d>] tg3_poll+0x6d3/0x906 [tg3]
>  [<c0284573>] net_rx_action+0xbc/0x1d5
>  [<c0123315>] __do_softirq+0x70/0xdb
>  [<c01233bb>] do_softirq+0x3b/0x42
>  [<c01058c2>] do_IRQ+0x6c/0xda
>  [<c01042e2>] common_interrupt+0x1a/0x20
>  [<c0101cc3>] mwait_idle_with_hints+0x3b/0x3f
>  [<c0101cd3>] mwait_idle+0xc/0x1b
>  [<c010271c>] cpu_idle+0x63/0x79
>  [<c03a9780>] start_kernel+0x353/0x423
>  [<c03a91cd>] unknown_bootoption+0x0/0x260
>  =======================
> -----------CONSOLE-----------
> 
> Steps to reproduce:
> 
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ