[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <7875709d-bdac-4081-f943-67e592a3573a@gmail.com>
Date: Mon, 20 Jan 2020 13:26:41 -0800
From: PGNet Dev <pgnet.dev@...il.com>
To: netdev@...r.kernel.org
Subject: kernel 5.4.13 'NETDEV WATCHDOG' timeout errors -- is it kernel?
driver? bios?
xen-users@...ts.xenproject.org
I'm bringing a server, running Xen 4.13 + kernel 5.4.13-24.g5cf5394-default, back up after disk changes.
On boot, I'm seeing these 'NETDEV WATCHDOG' oops, ending up with unstable/dropped network:
[ 35.344678] ------------[ cut here ]------------
[ 35.344703] NETDEV WATCHDOG: eno1 (e1000e): transmit queue 0 timed out
[ 35.344723] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x248/0x250
[ 35.344729] Modules linked in: af_packet br_netfilter bridge stp llc iscsi_ibft iscsi_boot_sysfs rfkill xen_pciback xen_netback xen_blkback xen_gntalloc dmi_sysfs xen_gntdev xen_evtchn nct6775 hwmon_vid msr sch_fq_codel intel_rapl_msr intel_rapl_common mei_wdt snd_hda_codec_hdmi nouveau raid10 mei_hdcp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mxm_wmi snd_hda_codec_realtek snd_hda_codec_generic wmi aesni_intel ledtrig_audio crypto_simd cryptd glue_helper ttm snd_hda_intel snd_intel_nhlt snd_hda_codec drm_kms_helper intel_pch_thermal snd_hda_core i2c_i801 drm snd_hwdep fb_sys_fops mei_me snd_pcm syscopyarea sysfillrect snd_timer sysimgblt snd mei soundcore ie31200_edac button xenfs xen_privcmd hid_generic usbhid raid1 md_mod firewire_ohci crc32c_intel igb firewire_core i2c_algo_bit dca crc_itu_t r8169 realtek libphy xhci_pci xhci_hcd ehci_pci ehci_hcd e1000e usbcore mvsas libsas scsi_transport_sas fan thermal video tcp_bbr sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
[ 35.344764] n_hdlc slhc nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache efivarfs
[ 35.344770] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.13-24.g5cf5394-default #1 openSUSE Tumbleweed (unreleased)
[ 35.344770] Hardware name: Supermicro X10SAT/X10SAT, BIOS 3.0 05/26/2015
[ 35.344772] RIP: 0010:dev_watchdog+0x248/0x250
[ 35.344773] Code: 85 c0 75 e5 eb 9f 4c 89 ef c6 05 41 93 b0 00 01 e8 dd f0 fa ff 44 89 e1 4c 89 ee 48 c7 c7 48 4d 16 82 48 89 c2 e8 c6 4d 86 ff <0f> 0b eb 80 0f 1f 40 00 0f 1f 44 00 00 41 57 41 56 49 89 d6 41 55
[ 35.344774] RSP: 0018:ffffc90000003e68 EFLAGS: 00010286
[ 35.344775] RAX: 0000000000000000 RBX: ffff88815caf7000 RCX: 000000000000083f
[ 35.344775] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
[ 35.344776] RBP: ffff88815c79c45c R08: ffff888164a19a18 R09: 0000000000000003
[ 35.344777] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 35.344777] R13: ffff88815c79c000 R14: ffff88815c79c480 R15: 0000000000000001
[ 35.344778] FS: 0000000000000000(0000) GS:ffff888164a00000(0000) knlGS:0000000000000000
[ 35.344779] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.344779] CR2: 00007ff3eaeb0090 CR3: 00000001632e6003 CR4: 00000000001606b0
[ 35.344782] Call Trace:
[ 35.344788] <IRQ>
[ 35.344791] ? pfifo_fast_enqueue+0x150/0x150
[ 35.344793] call_timer_fn+0x2d/0x130
[ 35.344795] __run_timers.part.0+0x185/0x280
[ 35.344797] ? pfifo_fast_enqueue+0x150/0x150
[ 35.344800] ? handle_irq_event_percpu+0x72/0x80
[ 35.344805] run_timer_softirq+0x26/0x50
[ 35.344807] __do_softirq+0x118/0x33b
[ 35.344810] irq_exit+0xb9/0xc0
[ 35.344814] xen_evtchn_do_upcall+0x2c/0x40
[ 35.344819] xen_hvm_callback_vector+0xf/0x20
[ 35.344820] </IRQ>
[ 35.344821] RIP: 0010:native_safe_halt+0xe/0x10
[ 35.344822] Code: 90 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d e6 02 4a 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d d6 02 4a 00 fb f4 <c3> 90 0f 1f 44 00 00 41 54 55 53 e8 12 67 7b ff 65 8b 2d 6b 75 6a
[ 35.344823] RSP: 0018:ffffffff82203e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c
[ 35.344824] RAX: 0000000080000000 RBX: 0000000000000000 RCX: 0000000000000001
[ 35.344825] RDX: 0000000000000001 RSI: 0000000000000083 RDI: 0000000000000000
[ 35.344825] RBP: 0000000000000000 R08: 00000015956e8dad R09: 0000000000000000
[ 35.344826] R10: 0000000000000000 R11: 0000000000000018 R12: ffffffff82214780
[ 35.344826] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff82214780
[ 35.344828] default_idle+0x1f/0x140
[ 35.344831] do_idle+0x1ff/0x280
[ 35.344832] cpu_startup_entry+0x19/0x20
[ 35.344835] start_kernel+0x4f2/0x511
[ 35.344838] secondary_startup_64+0xb6/0xc0
[ 35.344839] ---[ end trace 7edaffa8e97068ae ]---
[ 35.344861] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 40.096402] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 45.788752] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 50.726862] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 55.783348] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 64.581820] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 74.730696] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 79.628577] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 84.704205] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 89.701827] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 99.551935] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 104.369653] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 109.802904] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 114.600968] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 119.775784] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 124.752690] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 134.622908] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 139.441154] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 144.615671] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 149.612495] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 159.722815] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 164.561039] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 169.696432] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
[ 174.753536] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
here atm,
ethtool -d eno1
MAC Registers
-------------
0x00000: CTRL (Device control register) 0x40180240
Endian mode (buffers): little
Link reset: normal
Set link up: 1
Invert Loss-Of-Signal: no
Receive flow control: disabled
Transmit flow control: disabled
VLAN mode: enabled
Auto speed detect: disabled
Speed select: 1000Mb/s
Force speed: no
Force duplex: no
0x00008: STATUS (Device status register) 0x00080083
Duplex: full
Link up: link config
TBI mode: disabled
Link speed: 1000Mb/s
Bus type: PCI
Bus speed: 33MHz
Bus width: 32-bit
0x00100: RCTL (Receive control register) 0x04008000
Receiver: disabled
Store bad packets: disabled
Unicast promiscuous: disabled
Multicast promiscuous: disabled
Long packet: disabled
Descriptor minimum threshold size: 1/2
Broadcast accept mode: accept
VLAN filter: disabled
Canonical form indicator: disabled
Discard pause frames: filtered
Pass MAC control frames: don't pass
Receive buffer size: 2048
0x02808: RDLEN (Receive desc length) 0x00001000
0x02810: RDH (Receive desc head) 0x00000001
0x02818: RDT (Receive desc tail) 0x000000F0
0x02820: RDTR (Receive delay timer) 0x00000000
0x00400: TCTL (Transmit ctrl register) 0x3103F0F8
Transmitter: disabled
Pad short packets: enabled
Software XOFF Transmission: disabled
Re-transmit on late collision: enabled
0x03808: TDLEN (Transmit desc length) 0x00001000
0x03810: TDH (Transmit desc head) 0x00000001
0x03818: TDT (Transmit desc tail) 0x00000001
0x03820: TIDV (Transmit delay timer) 0x00000008
PHY type: unknown
is this a kernel, Xen, e1000 driver &/or BIOS issue?
any known fix/workaround, or discussion, to point to?
Powered by blists - more mailing lists