[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9f65a5e7-2380-86fd-79cf-bae23a4e79fa@gmail.com>
Date: Tue, 21 Jan 2020 22:16:27 -0800
From: PGNet Dev <pgnet.dev@...il.com>
To: netdev@...r.kernel.org
Subject: Re: kernel 5.4.13 'NETDEV WATCHDOG' timeout errors -- is it kernel?
driver? bios?
Any suggestions on this one?
On 1/20/20 1:26 PM, PGNet Dev wrote:
> xen-users@...ts.xenproject.org
>
> I'm bringing a server, running Xen 4.13 + kernel 5.4.13-24.g5cf5394-default, back up after disk changes.
>
> On boot, I'm seeing these 'NETDEV WATCHDOG' oops, ending up with unstable/dropped network:
>
> [ 35.344678] ------------[ cut here ]------------
> [ 35.344703] NETDEV WATCHDOG: eno1 (e1000e): transmit queue 0 timed out
> [ 35.344723] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x248/0x250
> [ 35.344729] Modules linked in: af_packet br_netfilter bridge stp llc iscsi_ibft iscsi_boot_sysfs rfkill xen_pciback xen_netback xen_blkback xen_gntalloc dmi_sysfs xen_gntdev xen_evtchn nct6775 hwmon_vid msr sch_fq_codel intel_rapl_msr intel_rapl_common mei_wdt snd_hda_codec_hdmi nouveau raid10 mei_hdcp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mxm_wmi snd_hda_codec_realtek snd_hda_codec_generic wmi aesni_intel ledtrig_audio crypto_simd cryptd glue_helper ttm snd_hda_intel snd_intel_nhlt snd_hda_codec drm_kms_helper intel_pch_thermal snd_hda_core i2c_i801 drm snd_hwdep fb_sys_fops mei_me snd_pcm syscopyarea sysfillrect snd_timer sysimgblt snd mei soundcore ie31200_edac button xenfs xen_privcmd hid_generic usbhid raid1 md_mod firewire_ohci crc32c_intel igb firewire_core i2c_algo_bit dca crc_itu_t r8169 realtek libphy xhci_pci xhci_hcd ehci_pci ehci_hcd e1000e usbcore mvsas libsas scsi_transport_sas fan thermal video tcp_bbr sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
> [ 35.344764] n_hdlc slhc nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache efivarfs
> [ 35.344770] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.13-24.g5cf5394-default #1 openSUSE Tumbleweed (unreleased)
> [ 35.344770] Hardware name: Supermicro X10SAT/X10SAT, BIOS 3.0 05/26/2015
> [ 35.344772] RIP: 0010:dev_watchdog+0x248/0x250
> [ 35.344773] Code: 85 c0 75 e5 eb 9f 4c 89 ef c6 05 41 93 b0 00 01 e8 dd f0 fa ff 44 89 e1 4c 89 ee 48 c7 c7 48 4d 16 82 48 89 c2 e8 c6 4d 86 ff <0f> 0b eb 80 0f 1f 40 00 0f 1f 44 00 00 41 57 41 56 49 89 d6 41 55
> [ 35.344774] RSP: 0018:ffffc90000003e68 EFLAGS: 00010286
> [ 35.344775] RAX: 0000000000000000 RBX: ffff88815caf7000 RCX: 000000000000083f
> [ 35.344775] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
> [ 35.344776] RBP: ffff88815c79c45c R08: ffff888164a19a18 R09: 0000000000000003
> [ 35.344777] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
> [ 35.344777] R13: ffff88815c79c000 R14: ffff88815c79c480 R15: 0000000000000001
> [ 35.344778] FS: 0000000000000000(0000) GS:ffff888164a00000(0000) knlGS:0000000000000000
> [ 35.344779] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 35.344779] CR2: 00007ff3eaeb0090 CR3: 00000001632e6003 CR4: 00000000001606b0
> [ 35.344782] Call Trace:
> [ 35.344788] <IRQ>
> [ 35.344791] ? pfifo_fast_enqueue+0x150/0x150
> [ 35.344793] call_timer_fn+0x2d/0x130
> [ 35.344795] __run_timers.part.0+0x185/0x280
> [ 35.344797] ? pfifo_fast_enqueue+0x150/0x150
> [ 35.344800] ? handle_irq_event_percpu+0x72/0x80
> [ 35.344805] run_timer_softirq+0x26/0x50
> [ 35.344807] __do_softirq+0x118/0x33b
> [ 35.344810] irq_exit+0xb9/0xc0
> [ 35.344814] xen_evtchn_do_upcall+0x2c/0x40
> [ 35.344819] xen_hvm_callback_vector+0xf/0x20
> [ 35.344820] </IRQ>
> [ 35.344821] RIP: 0010:native_safe_halt+0xe/0x10
> [ 35.344822] Code: 90 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d e6 02 4a 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d d6 02 4a 00 fb f4 <c3> 90 0f 1f 44 00 00 41 54 55 53 e8 12 67 7b ff 65 8b 2d 6b 75 6a
> [ 35.344823] RSP: 0018:ffffffff82203e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c
> [ 35.344824] RAX: 0000000080000000 RBX: 0000000000000000 RCX: 0000000000000001
> [ 35.344825] RDX: 0000000000000001 RSI: 0000000000000083 RDI: 0000000000000000
> [ 35.344825] RBP: 0000000000000000 R08: 00000015956e8dad R09: 0000000000000000
> [ 35.344826] R10: 0000000000000000 R11: 0000000000000018 R12: ffffffff82214780
> [ 35.344826] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff82214780
> [ 35.344828] default_idle+0x1f/0x140
> [ 35.344831] do_idle+0x1ff/0x280
> [ 35.344832] cpu_startup_entry+0x19/0x20
> [ 35.344835] start_kernel+0x4f2/0x511
> [ 35.344838] secondary_startup_64+0xb6/0xc0
> [ 35.344839] ---[ end trace 7edaffa8e97068ae ]---
> [ 35.344861] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 40.096402] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 45.788752] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 50.726862] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 55.783348] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 64.581820] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 74.730696] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 79.628577] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 84.704205] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 89.701827] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 99.551935] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 104.369653] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 109.802904] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 114.600968] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 119.775784] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 124.752690] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 134.622908] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 139.441154] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 144.615671] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 149.612495] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 159.722815] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 164.561039] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> [ 169.696432] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
> [ 174.753536] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
>
>
> here atm,
>
> ethtool -d eno1
> MAC Registers
> -------------
> 0x00000: CTRL (Device control register) 0x40180240
> Endian mode (buffers): little
> Link reset: normal
> Set link up: 1
> Invert Loss-Of-Signal: no
> Receive flow control: disabled
> Transmit flow control: disabled
> VLAN mode: enabled
> Auto speed detect: disabled
> Speed select: 1000Mb/s
> Force speed: no
> Force duplex: no
> 0x00008: STATUS (Device status register) 0x00080083
> Duplex: full
> Link up: link config
> TBI mode: disabled
> Link speed: 1000Mb/s
> Bus type: PCI
> Bus speed: 33MHz
> Bus width: 32-bit
> 0x00100: RCTL (Receive control register) 0x04008000
> Receiver: disabled
> Store bad packets: disabled
> Unicast promiscuous: disabled
> Multicast promiscuous: disabled
> Long packet: disabled
> Descriptor minimum threshold size: 1/2
> Broadcast accept mode: accept
> VLAN filter: disabled
> Canonical form indicator: disabled
> Discard pause frames: filtered
> Pass MAC control frames: don't pass
> Receive buffer size: 2048
> 0x02808: RDLEN (Receive desc length) 0x00001000
> 0x02810: RDH (Receive desc head) 0x00000001
> 0x02818: RDT (Receive desc tail) 0x000000F0
> 0x02820: RDTR (Receive delay timer) 0x00000000
> 0x00400: TCTL (Transmit ctrl register) 0x3103F0F8
> Transmitter: disabled
> Pad short packets: enabled
> Software XOFF Transmission: disabled
> Re-transmit on late collision: enabled
> 0x03808: TDLEN (Transmit desc length) 0x00001000
> 0x03810: TDH (Transmit desc head) 0x00000001
> 0x03818: TDT (Transmit desc tail) 0x00000001
> 0x03820: TIDV (Transmit delay timer) 0x00000008
> PHY type: unknown
>
> is this a kernel, Xen, e1000 driver &/or BIOS issue?
>
> any known fix/workaround, or discussion, to point to?
>
Powered by blists - more mailing lists