[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1442489793.131189.142.camel@infradead.org>
Date: Thu, 17 Sep 2015 12:36:33 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Francois Romieu <romieu@...zoreil.com>
Cc: Stephen Hemminger <stephen@...workplumber.org>,
David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared
On Mon, 2015-09-14 at 23:59 +0200, Francois Romieu wrote:
>
> [...]
> > [308309.574551] 8139cp 0000:00:0b.0 eth1: Transmit timeout, status
> c 2b 0 80ff
>
> Rx and Tx are enabled.
>
> Instant (untested) hack below.
Thanks; I'll try that. In fact since updating to 4.2 the problem has
got worse — now the whole machine dies:
[ 232.064630] ------------[ cut here ]------------
[ 232.069282] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x1e5/0x200()
[ 232.077840] NETDEV WATCHDOG: eth1 (8139cp): transmit queue 0 timed out
[ 232.084380] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 aes_i586
[ 232.157787] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-gx+ #25
[ 232.163982] 00000000 c10313eb dee95000 00000000 ffffff32 00000258 c1031446 00000009
[ 232.171988] dec0bf74 c13c3afc dec0bf8c c1272ef5 c13bfe82 0000012f c13c3afc dee95000
[ 232.179978] e04cfd3c 00000000 dee95000 dee95240 00000258 80000100 c1272d10 dee95000
[ 232.188012] Call Trace:
[ 232.190482] [<c10313eb>] ? warn_slowpath_common+0x5b/0x90
[ 232.196063] [<c1031446>] ? warn_slowpath_fmt+0x26/0x30
[ 232.201307] [<c1272ef5>] ? dev_watchdog+0x1e5/0x200
[ 232.206317] [<c1272d10>] ? qdisc_rcu_free+0x30/0x30
[ 232.211307] [<c105b19e>] ? call_timer_fn.isra.7+0xe/0x60
[ 232.216811] [<c1272d10>] ? qdisc_rcu_free+0x30/0x30
[ 232.221794] [<c105b33d>] ? run_timer_softirq+0xfd/0x1b0
[ 232.227221] [<c1033777>] ? __do_softirq+0xa7/0x190
[ 232.232117] [<c10336d0>] ? __hrtimer_tasklet_trampoline+0x20/0x20
[ 232.238395] [<c100336b>] ? do_softirq_own_stack+0x1b/0x20
[ 232.243881] <IRQ> [<c1003165>] ? do_IRQ+0x35/0xa0
[ 232.248904] [<c131e2a9>] ? common_interrupt+0x29/0x30
[ 232.254141] [<c104007b>] ? put_unbound_pool+0x17b/0x1a0
[ 232.259470] [<c1008bf2>] ? default_idle+0x2/0x10
[ 232.264213] [<c1009106>] ? arch_cpu_idle+0x6/0x10
[ 232.269026] [<c104eee5>] ? cpu_startup_entry+0xf5/0x190
[ 232.274459] [<c142a9b9>] ? start_kernel+0x2e5/0x2e8
[ 232.279432] ---[ end trace 30ae4e701c36b431 ]---
[ 232.284167] 8139cp 0000:00:0b.0 eth1: Transmit timeout, status c 2b 1 80ac
[ 260.106382] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper:0]
[ 260.113515] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 aes_i586
[ 260.116369] CPU: 0 PID: 0 Comm: swapper Tainted: G W 4.2.0-gx+ #25
[ 260.116369] task: c13f7540 ti: c13f0000 task.ti: c13f0000
[ 260.116369] EIP: 0060:[<c131d6fa>] EFLAGS: 00200292 CPU: 0
[ 260.116369] EIP is at _raw_spin_unlock_irqrestore+0xa/0x10
[ 260.116369] EAX: 00000000 EBX: 00200292 ECX: e04d8100 EDX: 00200292
[ 260.116369] ESI: 00000000 EDI: ffffff32 EBP: 00000258 ESP: dec0bf90
[ 260.116369] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 260.116369] CR0: 8005003b CR2: 098c4c50 CR3: 1ef88000 CR4: 00000090
[ 260.116369] Stack:
[ 260.116369] dee95000 c1272ecb dee95000 dee95240 00000258 80000100 c1272d10 dee95000
[ 260.116369] 00000100 c105b19e 00000410 c1272d10 c105b33d 00000000 c146d524 00000001
[ 260.116369] 00000002 c1033777 00000002 ffffe577 c13f2000 0000000a 00200000 c13f1f70
[ 260.116369] Call Trace:
[ 260.116369] [<c1272ecb>] ? dev_watchdog+0x1bb/0x200
[ 260.116369] [<c1272d10>] ? qdisc_rcu_free+0x30/0x30
[ 260.116369] [<c105b19e>] ? call_timer_fn.isra.7+0xe/0x60
[ 260.116369] [<c1272d10>] ? qdisc_rcu_free+0x30/0x30
[ 260.116369] [<c105b33d>] ? run_timer_softirq+0xfd/0x1b0
[ 260.116369] [<c1033777>] ? __do_softirq+0xa7/0x190
[ 260.116369] [<c10336d0>] ? __hrtimer_tasklet_trampoline+0x20/0x20
[ 260.116369] [<c100336b>] ? do_softirq_own_stack+0x1b/0x20
[ 260.116369] <IRQ>
[ 260.116369] [<c1003165>] ? do_IRQ+0x35/0xa0
[ 260.116369] [<c131e2a9>] ? common_interrupt+0x29/0x30
[ 260.116369] [<c104007b>] ? put_unbound_pool+0x17b/0x1a0
[ 260.116369] [<c1008bf2>] ? default_idle+0x2/0x10
[ 260.116369] [<c1009106>] ? arch_cpu_idle+0x6/0x10
[ 260.116369] [<c104eee5>] ? cpu_startup_entry+0xf5/0x190
[ 260.116369] [<c142a9b9>] ? start_kernel+0x2e5/0x2e8
[ 260.116369] Code: 00 b8 01 00 00 00 c3 8d 76 00 8d bc 27 00 00 00 00 e8 db 2e d3 ff c3 8d 76 00 8d bc 27 00 00 00 00 53 89 d3 e8 c8 2e d3 ff 53 9d <5b> c3 8d 74 26 00 e8 bb 2e d3 ff fb c3 89 f6 8d bc 27 00 00 00
At this point even sysrq on the serial console isn't working; I'm
going to have to go and physically reset it.
(gdb) list *dev_watchdog+0x1bb
0xc1272ecb is in dev_watchdog (net/sched/sch_generic.c:304).
299 }
300
301 if (some_queue_timedout) {
302 WARN_ONCE(1, KERN_INFO
"NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
303 dev->name,
netdev_drivername(dev), i);
304 dev->netdev_ops
->ndo_tx_timeout(dev);
305 }
306 if (!mod_timer(&dev->watchdog_timer,
307 round_jiffies(jiffies
+
308 dev
->watchdog_timeo)))
So that _raw_spin_unlock_irqrestore() is going to be a tailcall from
the end of cp_tx_timeout(). Moderately confused that it dies on *unlock*.
_raw_spin_lock_irqrestore+0xa is the instruction after a 'popf', which
makes me wonder if it dies in an IRQ storm. Although then *some* of the
NMI watchdog invocations would surely show an IRQ on the stack, but
they
don't; they're all right on the same instruction.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@...el.com Intel Corporation
Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5691 bytes)
Powered by blists - more mailing lists