lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <51356AC1.4090302@gmail.com>
Date:	Tue, 05 Mar 2013 11:47:13 +0800
From:	Cong Wang <xiyou.wangcong@...il.com>
To:	dormando <dormando@...ia.net>
CC:	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: BUG: IPv4: Attempt to release TCP socket in state 1

(Cc'ing the right netdev mailing list...)

On 03/05/2013 08:01 AM, dormando wrote:
> Hi!
>
> I have a (core lockup?) with 3.7.6+ and 3.8.2 which appears to be under
> ixgbe. The machine appears to still be up but network stays in a severely
> hobbled state. Either lagging or not responding to the network at all.
>
> On a new box the hang happens within 8-24 hours of giving it production
> network traffic. On an older machine (6 cores instead of 8, etc) it can
> run for a week or more before hanging.
>
> The hang from 3.7 might be slightly different than 3.8. They seem to be
> mostly the same aside from 3.8 hanging in the GRO path. Don't see anything
> obvious in 3.9-rc1 that would fix it, and haven't tried 3.9-rc1.
>
> I've not yet figured out how to reproduce outside of production (as
> always, sigh). This doesn't seem to happen with 3.6.6, but we have
> different and less frequent kernel panics there.
>
>  From 3.7:
>
> [21934.669780] IPv4: Attempt to release TCP socket in state 1
> ffff882785e3db00
> [21969.265883] ------------[ cut here ]------------
> [21969.265898] WARNING: at net/sched/sch_generic.c:255
> dev_watchdog+0x258/0x270()
> [21969.265900] Hardware name: X9DR3-F
> [21969.265902] NETDEV WATCHDOG: eth2 (ixgbe): transmit queue 11 timed out
> [21969.265903] Modules linked in: macvlan bridge ipmi_watchdog
> ipmi_devintf coretemp ghash_clmulni_intel gpio_ich microcode ixgbe sb_edac
> mdio lpc_ich edac_core mei mfd_core ipmi_si ipmi_msghandler isci libsas
> igb
> [21969.265930] Pid: 0, comm: swapper/10 Not tainted 3.7.8 #1
> [21969.265931] Call Trace:
> [21969.265933]  <IRQ>  [<ffffffff810484ff>] warn_slowpath_common+0x7f/0xc0
> [21969.265945]  [<ffffffff815a712e>] ? ip_local_deliver_finish+0xde/0x290
> [21969.265948]  [<ffffffff810485f6>] warn_slowpath_fmt+0x46/0x50
> [21969.265950]  [<ffffffff815a69b9>] ? ip_rcv_finish+0x119/0x360
> [21969.265953]  [<ffffffff8157d538>] dev_watchdog+0x258/0x270
> [21969.265956]  [<ffffffff8157d2e0>] ? __netdev_watchdog_up+0x80/0x80
> [21969.265960]  [<ffffffff81058349>] call_timer_fn+0x49/0x130
> [21969.265963]  [<ffffffff81078f9f>] ? scheduler_tick+0x15f/0x190
> [21969.265965]  [<ffffffff81058944>] run_timer_softirq+0x224/0x290
> [21969.265967]  [<ffffffff81058066>] ? update_process_times+0x76/0x90
> [21969.265969]  [<ffffffff8157d2e0>] ? __netdev_watchdog_up+0x80/0x80
> [21969.265974]  [<ffffffff8108b4f4>] ? ktime_get+0x54/0xe0
> [21969.265977]  [<ffffffff810509c7>] __do_softirq+0xc7/0x230
> [21969.265990]  [<ffffffff816757cc>] call_softirq+0x1c/0x30
> [21969.265995]  [<ffffffff81004475>] do_softirq+0x55/0x90
> [21969.265997]  [<ffffffff810507c5>] irq_exit+0x85/0xa0
> [21969.265999]  [<ffffffff81675dfe>] smp_apic_timer_interrupt+0x6e/0x99
> [21969.266002]  [<ffffffff816751ca>] apic_timer_interrupt+0x6a/0x70
> [21969.266003]  <EOI>  [<ffffffff8166b17a>] ? __schedule+0x3aa/0x750
> [21969.266011]  [<ffffffff8100b2ed>] ? mwait_idle+0xad/0x1f0
> [21969.266013]  [<ffffffff8100a7a3>] cpu_idle+0xb3/0x100
> [21969.266017]  [<ffffffff816632e3>] start_secondary+0x1c9/0x1d0
> [21969.266019] ---[ end trace 0739ad788910e77e ]---
> [21969.266059] ixgbe 0000:83:00.0 eth2: Reset adapter
> [22019.676899] INFO: rcu_sched self-detected stall on CPU { 30}  (t=15001
> jiffies)
> [22019.676963] Pid: 0, comm: swapper/30 Tainted: G        W    3.7.8 #1
> [22019.676966] Call Trace:
> [22019.676968]  <IRQ>  [<ffffffff810bb144>]
> rcu_check_callbacks+0x1b4/0x600
> [22019.676985]  [<ffffffff8107e1b8>] ? account_system_time+0xe8/0x1e0
> [22019.676988]  [<ffffffff81058038>] update_process_times+0x48/0x90
> [22019.676993]  [<ffffffff81092aa7>] tick_sched_timer+0x77/0x160
> [22019.677006]  [<ffffffff8106f66d>] __run_hrtimer+0x7d/0x1c0
> [22019.677008]  [<ffffffff81092a30>] ? tick_setup_sched_timer+0x110/0x110
> [22019.677010]  [<ffffffff8106fa26>] hrtimer_interrupt+0xf6/0x230
> [22019.677015]  [<ffffffff81675df9>] smp_apic_timer_interrupt+0x69/0x99
> [22019.677018]  [<ffffffff816751ca>] apic_timer_interrupt+0x6a/0x70
> [22019.677023]  [<ffffffff815afaa0>] ?
> __inet_lookup_established+0xc0/0x280
> [22019.677026]  [<ffffffff815a68a0>] ? inet_del_protocol+0x40/0x40
> [22019.677030]  [<ffffffff815cc383>] tcp_v4_early_demux+0xa3/0x170
> [22019.677033]  [<ffffffff815a69ed>] ip_rcv_finish+0x14d/0x360
> [22019.677035]  [<ffffffff815a6f66>] ip_rcv+0x226/0x310
> [22019.677041]  [<ffffffff815609f2>] __netif_receive_skb+0x492/0x640
> [22019.677043]  [<ffffffff81074209>] ? __wake_up_common+0x59/0x90
> [22019.677051]  [<ffffffffa00f284b>] ? ixgbe_poll+0xe3b/0x1140 [ixgbe]
> [22019.677054]  [<ffffffff81560c94>] process_backlog+0xf4/0x1e0
> [22019.677056]  [<ffffffff815619c5>] net_rx_action+0xf5/0x260
> [22019.677070]  [<ffffffff810509c7>] __do_softirq+0xc7/0x230
> [22019.677072]  [<ffffffff816757cc>] call_softirq+0x1c/0x30
> [22019.677076]  [<ffffffff81004475>] do_softirq+0x55/0x90
> [22019.677078]  [<ffffffff810507c5>] irq_exit+0x85/0xa0
> [22019.677080]  [<ffffffff81675d16>] do_IRQ+0x66/0xe0
> [22019.677084]  [<ffffffff8166c8aa>] common_interrupt+0x6a/0x6a
> [22019.677085]  <EOI>  [<ffffffff8166b17a>] ? __schedule+0x3aa/0x750
> [22019.677090]  [<ffffffff8100b2ed>] ? mwait_idle+0xad/0x1f0
> [22019.677092]  [<ffffffff8100a7a3>] cpu_idle+0xb3/0x100
> [22019.677096]  [<ffffffff816632e3>] start_secondary+0x1c9/0x1d0
> [22188.695704] INFO: task kworker/10:2:676 blocked for more than 120
> seconds.
> [22188.695750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [22188.695807] kworker/10:2    D ffffffff81806e40     0   676      2
> 0x00000000
> [22188.695813]  ffff882ff9dadad8 0000000000000046 ffff882ff9082d80
> 00000000000126c0
> [22188.695816]  ffff882ff9dadfd8 ffff882ff9dac010 00000000000126c0
> 00000000000126c0
> [22188.695818]  ffff882ff9dadfd8 00000000000126c0 ffff882ffb185b00
> ffff882ff9082d80
> [22188.695820] Call Trace:
> [22188.695830]  [<ffffffff8166b5e9>] schedule+0x29/0x70
> [22188.695833]  [<ffffffff816698e5>] schedule_timeout+0x165/0x200
> [22188.695838]  [<ffffffff810796b5>] ? ttwu_do_wakeup+0x45/0x100
> [22188.695840]  [<ffffffff810797b9>] ? T.1871+0x49/0x60
> [22188.695843]  [<ffffffff8107c28e>] ? try_to_wake_up+0x23e/0x2b0
> [22188.695845]  [<ffffffff8166ac58>] wait_for_common+0xc8/0x160
> [22188.695847]  [<ffffffff8107c300>] ? try_to_wake_up+0x2b0/0x2b0
> [22188.695852]  [<ffffffff810b90c0>] ? rcu_cpu_stall_reset+0x60/0x60
> [22188.695854]  [<ffffffff8166adcd>] wait_for_completion+0x1d/0x20
> [22188.695859]  [<ffffffff810aed96>] __stop_cpus+0x56/0x80
> [22188.695861]  [<ffffffff810b90c0>] ? rcu_cpu_stall_reset+0x60/0x60
> [22188.695864]  [<ffffffff810aee0d>] try_stop_cpus+0x4d/0x80
> [22188.695867]  [<ffffffff810bb62a>]
> synchronize_sched_expedited+0x9a/0x120
> [22188.695869]  [<ffffffff810bb6be>] synchronize_rcu_expedited+0xe/0x10
> [22188.695874]  [<ffffffff8155a8e5>] synchronize_net+0x25/0x30
> [22188.695880]  [<ffffffff8157dbb4>] dev_deactivate_many+0x254/0x260
> [22188.695882]  [<ffffffff8157dbed>] dev_deactivate+0x2d/0x40
> [22188.695886]  [<ffffffff8156fff4>] linkwatch_do_dev+0x34/0x60
> [22188.695888]  [<ffffffff815701d3>] __linkwatch_run_queue+0xf3/0x1e0
> [22188.695891]  [<ffffffff815702e5>] linkwatch_event+0x25/0x30
> [22188.695894]  [<ffffffff81064180>] process_one_work+0x160/0x460
> [22188.695896]  [<ffffffff815702c0>] ? __linkwatch_run_queue+0x1e0/0x1e0
> [22188.695899]  [<ffffffff8106631b>] worker_thread+0x12b/0x3d0
> [22188.695901]  [<ffffffff810661f0>] ? manage_workers+0x300/0x300
> [22188.695904]  [<ffffffff8106b26e>] kthread+0xce/0xe0
> [22188.695907]  [<ffffffff8106b1a0>] ?
> kthread_freezable_should_stop+0x70/0x70
> [22188.695911]  [<ffffffff8167475c>] ret_from_fork+0x7c/0xb0
> [22188.695913]  [<ffffffff8106b1a0>] ?
> kthread_freezable_should_stop+0x70/0x70
>
> [tons of processes hung in a similar way]
>
> Then every few hundred seconds swapper bails:
>
> [22919.239167] INFO: rcu_sched self-detected stall on CPU { 30}  (t=240021
> jiffies)
> [22919.239409] Pid: 0, comm: swapper/30 Tainted: G        W    3.7.8 #1
> [22919.239411] Call Trace:
> [22919.239413]  <IRQ>  [<ffffffff810bb144>]
> rcu_check_callbacks+0x1b4/0x600
> [22919.239430]  [<ffffffff8107e1b8>] ? account_system_time+0xe8/0x1e0
> [22919.239434]  [<ffffffff81058038>] update_process_times+0x48/0x90
> [22919.239439]  [<ffffffff81092aa7>] tick_sched_timer+0x77/0x160
> [22919.239442]  [<ffffffff8106f66d>] __run_hrtimer+0x7d/0x1c0
> [22919.239445]  [<ffffffff81092a30>] ? tick_setup_sched_timer+0x110/0x110
> [22919.239447]  [<ffffffff8106fa26>] hrtimer_interrupt+0xf6/0x230
> [22919.239453]  [<ffffffff81675df9>] smp_apic_timer_interrupt+0x69/0x99
> [22919.239455]  [<ffffffff816751ca>] apic_timer_interrupt+0x6a/0x70
> [22919.239461]  [<ffffffff815afaab>] ?
> __inet_lookup_established+0xcb/0x280
> [22919.239463]  [<ffffffff815a68a0>] ? inet_del_protocol+0x40/0x40
> [22919.239468]  [<ffffffff815cc383>] tcp_v4_early_demux+0xa3/0x170
> [22919.239470]  [<ffffffff815a69ed>] ip_rcv_finish+0x14d/0x360
> [22919.239472]  [<ffffffff815a6f66>] ip_rcv+0x226/0x310
> [22919.239478]  [<ffffffff815609f2>] __netif_receive_skb+0x492/0x640
> [22919.239481]  [<ffffffff81074209>] ? __wake_up_common+0x59/0x90
> [22919.239490]  [<ffffffffa00f284b>] ? ixgbe_poll+0xe3b/0x1140 [ixgbe]
> [22919.239493]  [<ffffffff81560c94>] process_backlog+0xf4/0x1e0
> [22919.239495]  [<ffffffff815619c5>] net_rx_action+0xf5/0x260
> [22919.239499]  [<ffffffff810509c7>] __do_softirq+0xc7/0x230
> [22919.239501]  [<ffffffff816757cc>] call_softirq+0x1c/0x30
> [22919.239505]  [<ffffffff81004475>] do_softirq+0x55/0x90
> [22919.239507]  [<ffffffff810507c5>] irq_exit+0x85/0xa0
> [22919.239509]  [<ffffffff81675d16>] do_IRQ+0x66/0xe0
> [22919.239513]  [<ffffffff8166c8aa>] common_interrupt+0x6a/0x6a
> [22919.239514]  <EOI>  [<ffffffff8166b17a>] ? __schedule+0x3aa/0x750
> [22919.239520]  [<ffffffff8100b2ed>] ? mwait_idle+0xad/0x1f0
> [22919.239522]  [<ffffffff8100a7a3>] cpu_idle+0xb3/0x100
> [22919.239526]  [<ffffffff816632e3>] start_secondary+0x1c9/0x1d0
> [23099.151590] INFO: rcu_sched self-detected stall on CPU { 30}  (t=285025
> jiffies)
> [23099.151823] Pid: 0, comm: swapper/30 Tainted: G        W
> [23099.151825] Call Trace:
> [23099.151827]  <IRQ>  [<ffffffff810bb144>]
> rcu_check_callbacks+0x1b4/0x600
> [23099.151841]  [<ffffffff8107e1b8>] ? account_system_time+0xe8/0x1e0
> [23099.151845]  [<ffffffff81058038>] update_process_times+0x48/0x90
> [23099.151849]  [<ffffffff81092aa7>] tick_sched_timer+0x77/0x160
> [23099.151853]  [<ffffffff8106f66d>] __run_hrtimer+0x7d/0x1c0
> [23099.151856]  [<ffffffff81092a30>] ? tick_setup_sched_timer+0x110/0x110
> [23099.151857]  [<ffffffff8106fa26>] hrtimer_interrupt+0xf6/0x230
> [23099.151863]  [<ffffffff81675df9>] smp_apic_timer_interrupt+0x69/0x99
> [23099.151865]  [<ffffffff816751ca>] apic_timer_interrupt+0x6a/0x70
> [23099.151870]  [<ffffffff815afb53>] ?
> __inet_lookup_established+0x173/0x280
> [23099.151873]  [<ffffffff815a68a0>] ? inet_del_protocol+0x40/0x40
> [23099.151877]  [<ffffffff815cc383>] tcp_v4_early_demux+0xa3/0x170
> [23099.151880]  [<ffffffff815a69ed>] ip_rcv_finish+0x14d/0x360
> [23099.151882]  [<ffffffff815a6f66>] ip_rcv+0x226/0x310
> [23099.151887]  [<ffffffff815609f2>] __netif_receive_skb+0x492/0x640
> [23099.151890]  [<ffffffff81074209>] ? __wake_up_common+0x59/0x90
> [23099.151897]  [<ffffffffa00f284b>] ? ixgbe_poll+0xe3b/0x1140 [ixgbe]
> [23099.151900]  [<ffffffff81560c94>] process_backlog+0xf4/0x1e0
> [23099.151902]  [<ffffffff815619c5>] net_rx_action+0xf5/0x260
> [23099.151906]  [<ffffffff810509c7>] __do_softirq+0xc7/0x230
> [23099.151908]  [<ffffffff816757cc>] call_softirq+0x1c/0x30
> [23099.151912]  [<ffffffff81004475>] do_softirq+0x55/0x90
> [23099.151914]  [<ffffffff810507c5>] irq_exit+0x85/0xa0
> [23099.151916]  [<ffffffff81675d16>] do_IRQ+0x66/0xe0
> [23099.151920]  [<ffffffff8166c8aa>] common_interrupt+0x6a/0x6a
> [23099.151920]  <EOI>  [<ffffffff8166b17a>] ? __schedule+0x3aa/0x750
> [23099.151926]  [<ffffffff8100b2ed>] ? mwait_idle+0xad/0x1f0
> [23099.151928]  [<ffffffff8100a7a3>] cpu_idle+0xb3/0x100
> [23099.151931]  [<ffffffff816632e3>] start_secondary+0x1c9/0x1d0
>
> Under 3.8.2:
>
> [33486.326977] IPv4: Attempt to release TCP socket in state 1
> ffff883269ea2300
> [33486.342971] IPv4: Attempt to release TCP socket in state 1
> ffff8835efccbf00
> [33505.595925] ------------[ cut here ]------------
> [33505.595934] WARNING: at net/sched/sch_generic.c:254
> dev_watchdog+0x258/0x270()
> [33505.595935] Hardware name: X9DR3-F
> [33505.595937] NETDEV WATCHDOG: eth2 (ixgbe): transmit queue 0 timed out
> [33505.595938] Modules linked in: macvlan iptable_nat nf_nat_ipv4 nf_nat
> bridge coretemp ghash_clmulni_intel gpio_ich ixgbe microcode sb_edac mei
> lpc_ich edac_core mfd_core mdio isci libsas igb ptp pps_core
> [33505.595951] Pid: 0, comm: swapper/4 Not tainted 3.8.2 #2
> [33505.595952] Call Trace:
> [33505.595954]  <IRQ>  [<ffffffff8104964f>] warn_slowpath_common+0x7f/0xc0
> [33505.595960]  [<ffffffff81049746>] warn_slowpath_fmt+0x46/0x50
> [33505.595962]  [<ffffffff815a1548>] dev_watchdog+0x258/0x270
> [33505.595965]  [<ffffffff815a12f0>] ? __netdev_watchdog_up+0x80/0x80
> [33505.595968]  [<ffffffff81059259>] call_timer_fn+0x49/0x130
> [33505.595972]  [<ffffffff8107a07f>] ? scheduler_tick+0x15f/0x190
> [33505.595974]  [<ffffffff81059854>] run_timer_softirq+0x224/0x290
> [33505.595976]  [<ffffffff81058f76>] ? update_process_times+0x76/0x90
> [33505.595978]  [<ffffffff815a12f0>] ? __netdev_watchdog_up+0x80/0x80
> [33505.595981]  [<ffffffff8108ebd4>] ? ktime_get+0x54/0xe0
> [33505.595983]  [<ffffffff810518a7>] __do_softirq+0xc7/0x230
> [33505.595987]  [<ffffffff8168fe0c>] call_softirq+0x1c/0x30
> [33505.595990]  [<ffffffff81004415>] do_softirq+0x55/0x90
> [33505.595993]  [<ffffffff810516a5>] irq_exit+0x85/0xa0
> [33505.595996]  [<ffffffff8169042e>] smp_apic_timer_interrupt+0x6e/0x99
> [33505.596000]  [<ffffffff8168f80a>] apic_timer_interrupt+0x6a/0x70
> [33505.596002]  <EOI>  [<ffffffff8168567c>] ? __schedule+0x3ac/0x750
> [33505.596009]  [<ffffffff8100b1fd>] ? mwait_idle+0xad/0x1f0
> [33505.596011]  [<ffffffff8100a743>] cpu_idle+0xb3/0x100
> [33505.596014]  [<ffffffff8167d7d2>] start_secondary+0x1d7/0x1de
> [33505.596015] ---[ end trace 3d817d7c7ae67386 ]---
> [33505.596064] ixgbe 0000:83:00.0 eth2: Reset adapter
> [33556.011932] INFO: rcu_sched self-detected stall on CPU { 24}  (t=15001
> jiffies g=1985385 c=1985384 q=270786)
> [33556.011968] Pid: 0, comm: swapper/24 Tainted: G        W    3.8.2 #2
> [33556.011970] Call Trace:
> [33556.011972]  <IRQ>  [<ffffffff810bea1e>]
> rcu_check_callbacks+0x21e/0x7c0
> [33556.011986]  [<ffffffff8107f518>] ? account_system_time+0xe8/0x1e0
> [33556.011992]  [<ffffffff81058f48>] update_process_times+0x48/0x90
> [33556.011996]  [<ffffffff81095e06>] tick_sched_timer+0x56/0x130
> [33556.012000]  [<ffffffff8107099d>] __run_hrtimer+0x7d/0x1c0
> [33556.012002]  [<ffffffff81095db0>] ? tick_setup_sched_timer+0x110/0x110
> [33556.012004]  [<ffffffff81070d56>] hrtimer_interrupt+0xf6/0x230
> [33556.012010]  [<ffffffff81690429>] smp_apic_timer_interrupt+0x69/0x99
> [33556.012013]  [<ffffffff8168f80a>] apic_timer_interrupt+0x6a/0x70
> [33556.012017]  [<ffffffff815d3deb>] ?
> __inet_lookup_established+0xcb/0x2d0
> [33556.012020]  [<ffffffff815cab80>] ? inet_del_protocol+0x40/0x40
> [33556.012024]  [<ffffffff815f078c>] tcp_v4_early_demux+0xac/0x170
> [33556.012025]  [<ffffffff815caccd>] ip_rcv_finish+0x14d/0x360
> [33556.012027]  [<ffffffff815cb246>] ip_rcv+0x226/0x310
> [33556.012032]  [<ffffffff815841a2>] __netif_receive_skb+0x492/0x640
> [33556.012034]  [<ffffffff8158455d>] netif_receive_skb+0x2d/0x90
> [33556.012036]  [<ffffffff815ed450>] ? tcp4_gro_receive+0xb0/0x130
> [33556.012038]  [<ffffffff81584655>] napi_gro_complete+0x95/0xe0
> [33556.012040]  [<ffffffff81584956>] dev_gro_receive+0x2b6/0x3b0
> [33556.012043]  [<ffffffff8158508b>] napi_gro_receive+0x5b/0x130
> [33556.012051]  [<ffffffffa01db04a>] ixgbe_poll+0x54a/0x1180 [ixgbe]
> [33556.012054]  [<ffffffff810792fa>] ? enqueue_task+0x6a/0x80
> [33556.012056]  [<ffffffff81584c15>] net_rx_action+0xf5/0x260
> [33556.012058]  [<ffffffff810518a7>] __do_softirq+0xc7/0x230
> [33556.012061]  [<ffffffff8168fe0c>] call_softirq+0x1c/0x30
> [33556.012064]  [<ffffffff81004415>] do_softirq+0x55/0x90
> [33556.012066]  [<ffffffff810516a5>] irq_exit+0x85/0xa0
> [33556.012068]  [<ffffffff81690346>] do_IRQ+0x66/0xe0
> [33556.012071]  [<ffffffff81686daa>] common_interrupt+0x6a/0x6a
> [33556.012073]  <EOI>  [<ffffffff8168567c>] ? __schedule+0x3ac/0x750
> [33556.012078]  [<ffffffff8100b1fd>] ? mwait_idle+0xad/0x1f0
> [33556.012080]  [<ffffffff8100a743>] cpu_idle+0xb3/0x100
> [33556.012082]  [<ffffffff8167d7d2>] start_secondary+0x1d7/0x1de
> [33716.090584] INFO: task kworker/4:2:882 blocked for more than 120
> seconds.
> [33716.090602] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [33716.090618] kworker/4:2     D ffffffff81807160     0   882      2
> 0x00000000
> [33716.090622]  ffff881fd2547ad8 0000000000000046 ffff881fd0ac2dc0
> 0000000000012700
> [33716.090624]  ffff881fd2547fd8 ffff881fd2546010 0000000000012700
> 0000000000012700
> [33716.090626]  ffff881fd2547fd8 0000000000012700 ffff881fd3655b80
> ffff881fd0ac2dc0
> [33716.090628] Call Trace:
> [33716.090639]  [<ffffffff81685ae9>] schedule+0x29/0x70
> [33716.090642]  [<ffffffff81683de5>] schedule_timeout+0x165/0x200
> [33716.090647]  [<ffffffff810283fe>] ? physflat_send_IPI_mask+0xe/0x10
> [33716.090650]  [<ffffffff8107d02e>] ? try_to_wake_up+0x23e/0x2b0
> [33716.090653]  [<ffffffff81685158>] wait_for_common+0xc8/0x160
> [33716.090654]  [<ffffffff8107d0a0>] ? try_to_wake_up+0x2b0/0x2b0
> [33716.090660]  [<ffffffff810bc890>] ? rcu_cpu_stall_reset+0x60/0x60
> [33716.090662]  [<ffffffff816852cd>] wait_for_completion+0x1d/0x20
> [33716.090665]  [<ffffffff810b2536>] __stop_cpus+0x56/0x80
> [33716.090667]  [<ffffffff810bc890>] ? rcu_cpu_stall_reset+0x60/0x60
> [33716.090669]  [<ffffffff810b25ad>] try_stop_cpus+0x4d/0x80
> [33716.090672]  [<ffffffff810bf0bb>]
> synchronize_sched_expedited+0xfb/0x1d0
> [33716.090674]  [<ffffffff810bf19e>] synchronize_rcu_expedited+0xe/0x10
> [33716.090678]  [<ffffffff8157e1f5>] synchronize_net+0x25/0x30
> [33716.090683]  [<ffffffff815a1bc4>] dev_deactivate_many+0x254/0x260
> [33716.090685]  [<ffffffff815a1bfd>] dev_deactivate+0x2d/0x40
> [33716.090688]  [<ffffffff81593dc4>] linkwatch_do_dev+0x34/0x60
> [33716.090690]  [<ffffffff81593fa3>] __linkwatch_run_queue+0xf3/0x1e0
> [33716.090692]  [<ffffffff815940b5>] linkwatch_event+0x25/0x30
> [33716.090696]  [<ffffffff810653f8>] process_one_work+0x168/0x450
> [33716.090699]  [<ffffffff8106757b>] worker_thread+0x12b/0x3d0
> [33716.090702]  [<ffffffff81067450>] ? manage_workers+0x300/0x300
> [33716.090704]  [<ffffffff8106c5ee>] kthread+0xce/0xe0
> [33716.090706]  [<ffffffff8106c520>] ?
> kthread_freezable_should_stop+0x70/0x70
> [33716.090709]  [<ffffffff8168ec5c>] ret_from_fork+0x7c/0xb0
> [33716.090711]  [<ffffffff8106c520>] ?
> kthread_freezable_should_stop+0x70/0x70
>
> [more hung processes bailing]
>
> [37335.739761] INFO: rcu_sched self-detected stall on CPU { 24}  (t=960083
> jiffies g=1985385 c=1985384 q=19390495)
> [37335.739828] Pid: 0, comm: swapper/24 Tainted: G        W    3.8.2 #2
> [37335.739830] Call Trace:
> [37335.739832]  <IRQ>  [<ffffffff810bea1e>]
> rcu_check_callbacks+0x21e/0x7c0
> [37335.739847]  [<ffffffff8107f518>] ? account_system_time+0xe8/0x1e0
> [37335.739853]  [<ffffffff81058f48>] update_process_times+0x48/0x90
> [37335.739857]  [<ffffffff81095e06>] tick_sched_timer+0x56/0x130
> [37335.739860]  [<ffffffff8107099d>] __run_hrtimer+0x7d/0x1c0
> [37335.739863]  [<ffffffff81095db0>] ? tick_setup_sched_timer+0x110/0x110
> [37335.739865]  [<ffffffff81070d56>] hrtimer_interrupt+0xf6/0x230
> [37335.739871]  [<ffffffff81690429>] smp_apic_timer_interrupt+0x69/0x99
> [37335.739874]  [<ffffffff8168f80a>] apic_timer_interrupt+0x6a/0x70
> [37335.739878]  [<ffffffff815d3def>] ?
> __inet_lookup_established+0xcf/0x2d0
> [37335.739880]  [<ffffffff815cab80>] ? inet_del_protocol+0x40/0x40
> [37335.739884]  [<ffffffff815f078c>] tcp_v4_early_demux+0xac/0x170
> [37335.739886]  [<ffffffff815caccd>] ip_rcv_finish+0x14d/0x360
> [37335.739888]  [<ffffffff815cb246>] ip_rcv+0x226/0x310
> [37335.739892]  [<ffffffff815841a2>] __netif_receive_skb+0x492/0x640
> [37335.739895]  [<ffffffff8158455d>] netif_receive_skb+0x2d/0x90
> [37335.739897]  [<ffffffff815ed450>] ? tcp4_gro_receive+0xb0/0x130
> [37335.739899]  [<ffffffff81584655>] napi_gro_complete+0x95/0xe0
> [37335.739901]  [<ffffffff81584956>] dev_gro_receive+0x2b6/0x3b0
> [37335.739903]  [<ffffffff8158508b>] napi_gro_receive+0x5b/0x130
> [37335.739911]  [<ffffffffa01db04a>] ixgbe_poll+0x54a/0x1180 [ixgbe]
> [37335.739915]  [<ffffffff810792fa>] ? enqueue_task+0x6a/0x80
> [37335.739917]  [<ffffffff81584c15>] net_rx_action+0xf5/0x260
> [37335.739919]  [<ffffffff810518a7>] __do_softirq+0xc7/0x230
> [37335.739922]  [<ffffffff8168fe0c>] call_softirq+0x1c/0x30
> [37335.739927]  [<ffffffff81004415>] do_softirq+0x55/0x90
> [37335.739928]  [<ffffffff810516a5>] irq_exit+0x85/0xa0
> [37335.739931]  [<ffffffff81690346>] do_IRQ+0x66/0xe0
> [37335.739937]  [<ffffffff81686daa>] common_interrupt+0x6a/0x6a
> [37335.739938]  <EOI>  [<ffffffff8168567c>] ? __schedule+0x3ac/0x750
> [37335.739943]  [<ffffffff8100b1fd>] ? mwait_idle+0xad/0x1f0
> [37335.739945]  [<ffffffff8100a743>] cpu_idle+0xb3/0x100
> [37335.739948]  [<ffffffff8167d7d2>] start_secondary+0x1d7/0x1de
> [37515.727179] INFO: rcu_sched self-detected stall on CPU { 24}
> (t=1005087 jiffies g=1985385 c=1985384 q=20855557)
> [37515.727246] Pid: 0, comm: swapper/24 Tainted: G        W    3.8.2 #2
> [37515.727249] Call Trace:
> [37515.727251]  <IRQ>  [<ffffffff810bea1e>]
> rcu_check_callbacks+0x21e/0x7c0
> [37515.727265]  [<ffffffff8107f518>] ? account_system_time+0xe8/0x1e0
> [37515.727271]  [<ffffffff81058f48>] update_process_times+0x48/0x90
> [37515.727275]  [<ffffffff81095e06>] tick_sched_timer+0x56/0x130
> [37515.727279]  [<ffffffff8107099d>] __run_hrtimer+0x7d/0x1c0
> [37515.727281]  [<ffffffff81095db0>] ? tick_setup_sched_timer+0x110/0x110
> [37515.727283]  [<ffffffff81070d56>] hrtimer_interrupt+0xf6/0x230
> [37515.727289]  [<ffffffff81690429>] smp_apic_timer_interrupt+0x69/0x99
> [37515.727292]  [<ffffffff8168f80a>] apic_timer_interrupt+0x6a/0x70
> [37515.727296]  [<ffffffff815d3deb>] ?
> __inet_lookup_established+0xcb/0x2d0
> [37515.727298]  [<ffffffff815cab80>] ? inet_del_protocol+0x40/0x40
> [37515.727302]  [<ffffffff815f078c>] tcp_v4_early_demux+0xac/0x170
> [37515.727304]  [<ffffffff815caccd>] ip_rcv_finish+0x14d/0x360
> [37515.727306]  [<ffffffff815cb246>] ip_rcv+0x226/0x310
> [37515.727310]  [<ffffffff815841a2>] __netif_receive_skb+0x492/0x640
> [37515.727312]  [<ffffffff8158455d>] netif_receive_skb+0x2d/0x90
> [37515.727315]  [<ffffffff815ed450>] ? tcp4_gro_receive+0xb0/0x130
> [37515.727317]  [<ffffffff81584655>] napi_gro_complete+0x95/0xe0
> [37515.727319]  [<ffffffff81584956>] dev_gro_receive+0x2b6/0x3b0
> [37515.727322]  [<ffffffff8158508b>] napi_gro_receive+0x5b/0x130
> [37515.727330]  [<ffffffffa01db04a>] ixgbe_poll+0x54a/0x1180 [ixgbe]
> [37515.727334]  [<ffffffff810792fa>] ? enqueue_task+0x6a/0x80
> [37515.727336]  [<ffffffff81584c15>] net_rx_action+0xf5/0x260
> [37515.727338]  [<ffffffff810518a7>] __do_softirq+0xc7/0x230
> [37515.727341]  [<ffffffff8168fe0c>] call_softirq+0x1c/0x30
> [37515.727345]  [<ffffffff81004415>] do_softirq+0x55/0x90
> [37515.727346]  [<ffffffff810516a5>] irq_exit+0x85/0xa0
> [37515.727349]  [<ffffffff81690346>] do_IRQ+0x66/0xe0
> [37515.727354]  [<ffffffff81686daa>] common_interrupt+0x6a/0x6a
> [37515.727355]  <EOI>  [<ffffffff8168567c>] ? __schedule+0x3ac/0x750
> [37515.727360]  [<ffffffff8100b1fd>] ? mwait_idle+0xad/0x1f0
> [37515.727362]  [<ffffffff8100a743>] cpu_idle+0xb3/0x100
> [37515.727365]  [<ffffffff8167d7d2>] start_secondary+0x1d7/0x1de
>
> ... then swapped just does this until someone reboots the box.
>
> Apologies for the ugly paste.
>
> Thanks,
> -Dormando
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ