lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 26 Feb 2024 15:10:05 +0100
From: Arno Lehmann <al@...-lehmann.de>
To: netdev@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, intel-wired-lan@...ts.osuosl.org,
 "David S. Miller" <davem@...emloft.net>
Subject: Re: intel i225 NIC loses PCIe link, network becomes unusable)

Hi all,

I couldn't see any replies to this issue (but I'm also not subscribed to 
the list, probably something I should consider...) but I have another 
instance of the issue reported.

Now cc'ing other parties in the hope to get indication this is already 
solved (somehow) or will be looked into.

Additional kernel messages about latest case below the quoted original mail!

On 12.02.24 at 11:39, Arno Lehmann wrote:
> Hello everybody,
> 
> I'm struggling with the problem named in the subject.
> 
> Originally reported to the debian bug tracker; you'll find the history 
> here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1060706
> 
> Infrequently, and apparently randomly, I have the situation that the 
> PCIe link for the NIC is lost. Obviously, the network then becomes 
> unusable. rmmod / modprobe'ing the igc module does not resolve this 
> problem, a reboot is necessary.
> 
> I noticed this initially when installing the system last year, did a bit 
> of a search, found that the kernel option 'pcie_aspm=off' was supposed 
> to be useful, set that, and have that enabled ever since.
> 
> The problem persists.
> 
> Most recent case is this one:
> 
> [So Feb 11 15:47:18 2024] igc 0000:0b:00.0 eno1: NIC Link is Down
> [So Feb 11 15:47:21 2024] igc 0000:0b:00.0 eno1: NIC Link is Up 1000 
> Mbps Full Duplex, Flow Control: RX
> [So Feb 11 16:52:01 2024] igc 0000:0b:00.0 eno1: NIC Link is Down
> [So Feb 11 16:52:05 2024] igc 0000:0b:00.0 eno1: NIC Link is Up 1000 
> Mbps Full Duplex, Flow Control: RX
> 
> (I have no idea if the above to events have any relevance.)
> 
> [So Feb 11 18:47:59 2024] igc 0000:0b:00.0 eno1: PCIe link lost, device 
> now detached
> [So Feb 11 18:47:59 2024] ------------[ cut here ]------------
> [So Feb 11 18:47:59 2024] igc: Failed to read reg 0xc030!
> [So Feb 11 18:47:59 2024] WARNING: CPU: 20 PID: 136256 at 
> drivers/net/ethernet/intel/igc/igc_main.c:6583 igc_rd32+0x8d/0xa0 [igc]
> [So Feb 11 18:47:59 2024] Modules linked in: rfcomm cpufreq_userspace 
> cpufreq_powersave cpufreq_ondemand cpufreq_conservative nfsv3 nfs_acl 
> rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache 
> netfs overlay qrtr cmac algif_hash algif_skcipher af_alg bnep sunrpc 
> binfmt_misc nls_ascii nls_cp437 vfat fat ext4 mbcache jbd2 
> intel_rapl_msr intel_rapl_common btusb btrtl btbcm btintel btmtk 
> bluetooth mt7921e snd_hda_codec_hdmi mt7921_common mt76_connac_lib 
> edac_mce_amd snd_hda_intel mt76 snd_intel_dspcfg kvm_amd 
> snd_intel_sdw_acpi sha3_generic mac80211 jitterentropy_rng snd_usb_audio 
> uvcvideo snd_hda_codec drbg libarc4 videobuf2_vmalloc snd_usbmidi_lib 
> asus_nb_wmi eeepc_wmi kvm uvc videobuf2_memops snd_rawmidi ansi_cprng 
> snd_hda_core asus_wmi videobuf2_v4l2 snd_seq_device snd_hwdep 
> ecdh_generic irqbypass battery ecc ledtrig_audio videodev snd_pcm 
> sparse_keymap cfg80211 crc16 rapl snd_timer videobuf2_common 
> platform_profile wmi_bmof sp5100_tco pcspkr snd ccp mc watchdog k10temp 
> soundcore rfkill joydev sg evdev msr
> [So Feb 11 18:47:59 2024]  parport_pc ppdev lp parport fuse loop 
> efi_pstore configfs efivarfs ip_tables x_tables autofs4 xfs libcrc32c 
> crc32c_generic sd_mod dm_crypt dm_mod uas usb_storage hid_generic amdgpu 
> amdxcp drm_buddy gpu_sched usbhid i2c_algo_bit drm_suballoc_helper hid 
> drm_display_helper sr_mod cdrom cec rc_core crc32_pclmul drm_ttm_helper 
> crc32c_intel ghash_clmulni_intel ttm ahci sha512_ssse3 sha512_generic 
> libahci nvme xhci_pci drm_kms_helper libata xhci_hcd nvme_core drm 
> aesni_intel t10_pi usbcore scsi_mod crypto_simd crc64_rocksoft_generic 
> igc cryptd crc64_rocksoft crc_t10dif crct10dif_generic i2c_piix4 
> crct10dif_pclmul crc64 crct10dif_common scsi_common usb_common video wmi 
> gpio_amdpt gpio_generic button
> [So Feb 11 18:47:59 2024] CPU: 20 PID: 136256 Comm: kworker/20:0 Not 
> tainted 6.5.0-0.deb12.4-amd64 #1  Debian 6.5.10-1~bpo12+1
> [So Feb 11 18:47:59 2024] Hardware name: ASUS System Product Name/ROG 
> STRIX X670E-A GAMING WIFI, BIOS 1904 01/29/2024
> [So Feb 11 18:47:59 2024] Workqueue: events igc_watchdog_task [igc]
> [So Feb 11 18:47:59 2024] RIP: 0010:igc_rd32+0x8d/0xa0 [igc]
> [So Feb 11 18:47:59 2024] Code: 48 c7 c6 10 76 36 c0 e8 81 6a c1 d5 48 
> 8b bb 28 ff ff ff e8 05 d2 97 d5 84 c0 74 bc 89 ee 48 c7 c7 38 76 36 c0 
> e8 c3 ee 36 d5 <0f> 0b eb aa b8 ff ff ff ff e9 15 cf e7 d5 0f 1f 44 00 
> 00 90 90 90
> [So Feb 11 18:47:59 2024] RSP: 0018:ffffa203cfe8fdd8 EFLAGS: 00010282
> [So Feb 11 18:47:59 2024] RAX: 0000000000000000 RBX: ffff961b5c75ccb8 
> RCX: 0000000000000027
> [So Feb 11 18:47:59 2024] RDX: ffff962a5e7213c8 RSI: 0000000000000001 
> RDI: ffff962a5e7213c0
> [So Feb 11 18:47:59 2024] RBP: 000000000000c030 R08: 0000000000000000 
> R09: ffffa203cfe8fc68
> [So Feb 11 18:47:59 2024] R10: 0000000000000003 R11: ffff962a9de3ac28 
> R12: ffff961b5c75c000
> [So Feb 11 18:47:59 2024] R13: 0000000000000000 R14: ffff961b54c92d40 
> R15: 000000000000c030
> [So Feb 11 18:47:59 2024] FS:  0000000000000000(0000) 
> GS:ffff962a5e700000(0000) knlGS:0000000000000000
> [So Feb 11 18:47:59 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [So Feb 11 18:47:59 2024] CR2: 00007fb76de93000 CR3: 00000001153d0000 
> CR4: 0000000000750ee0
> [So Feb 11 18:47:59 2024] PKRU: 55555554
> [So Feb 11 18:47:59 2024] Call Trace:
> [So Feb 11 18:47:59 2024]  <TASK>
> [So Feb 11 18:47:59 2024]  ? igc_rd32+0x8d/0xa0 [igc]
> [So Feb 11 18:47:59 2024]  ? __warn+0x81/0x130
> [So Feb 11 18:47:59 2024]  ? igc_rd32+0x8d/0xa0 [igc]
> [So Feb 11 18:47:59 2024]  ? report_bug+0x171/0x1a0
> [So Feb 11 18:47:59 2024]  ? srso_alias_return_thunk+0x5/0x7f
> [So Feb 11 18:47:59 2024]  ? prb_read_valid+0x1b/0x30
> [So Feb 11 18:47:59 2024]  ? handle_bug+0x41/0x70
> [So Feb 11 18:47:59 2024]  ? exc_invalid_op+0x17/0x70
> [So Feb 11 18:47:59 2024]  ? asm_exc_invalid_op+0x1a/0x20
> [So Feb 11 18:47:59 2024]  ? igc_rd32+0x8d/0xa0 [igc]
> [So Feb 11 18:47:59 2024]  ? igc_rd32+0x8d/0xa0 [igc]
> [So Feb 11 18:47:59 2024]  igc_update_stats+0x8a/0x6d0 [igc]
> [So Feb 11 18:47:59 2024]  igc_watchdog_task+0x9d/0x4a0 [igc]
> [So Feb 11 18:47:59 2024]  process_one_work+0x1df/0x3e0
> [So Feb 11 18:47:59 2024]  worker_thread+0x51/0x390
> [So Feb 11 18:47:59 2024]  ? __pfx_worker_thread+0x10/0x10
> [So Feb 11 18:47:59 2024]  kthread+0xe5/0x120
> [So Feb 11 18:47:59 2024]  ? __pfx_kthread+0x10/0x10
> [So Feb 11 18:47:59 2024]  ret_from_fork+0x31/0x50
> [So Feb 11 18:47:59 2024]  ? __pfx_kthread+0x10/0x10
> [So Feb 11 18:47:59 2024]  ret_from_fork_asm+0x1b/0x30
> [So Feb 11 18:47:59 2024]  </TASK>
> [So Feb 11 18:47:59 2024] ---[ end trace 0000000000000000 ]---
> 
> 
> With the guidance from the friendly folks at the debian bug tracker, we 
> could find that this happens with many kernel versions, as can be 
> derived from the following (condensed list below):
> 
> # journalctl  --grep '(Linux version|PCIe link lost)' --quiet | cat
> Aug 30 18:16:18 Zwerg kernel: Linux version 6.1.0-11-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.38-4 (2023-08-08)
> Sep 20 14:21:17 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Sep 20 19:47:06 Zwerg kernel: Linux version 6.1.0-11-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.38-4 (2023-08-08)
> Okt 04 17:16:08 Zwerg kernel: Linux version 6.1.0-12-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.52-1 (2023-09-07)
> Okt 06 05:44:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Okt 07 16:39:10 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) 
> (uninitialized): PCIe link lost, device now detached
> Okt 07 16:43:41 Zwerg kernel: Linux version 6.1.0-12-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.52-1 (2023-09-07)
> Okt 23 18:23:54 Zwerg kernel: Linux version 6.1.0-12-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.52-1 (2023-09-07)
> Okt 23 18:31:25 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Okt 23 18:48:58 Zwerg kernel: Linux version 6.1.0-13-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.55-1 (2023-09-29)
> Okt 30 11:16:06 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Okt 31 13:50:06 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) 
> (uninitialized): PCIe link lost, device now detached
> Okt 31 13:52:01 Zwerg kernel: Linux version 6.1.0-13-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.55-1 (2023-09-29)
> Nov 22 18:59:11 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Nov 23 12:18:19 Zwerg kernel: Linux version 6.1.0-13-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.55-1 (2023-09-29)
> Nov 23 15:45:49 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Nov 23 15:52:51 Zwerg kernel: Linux version 6.1.0-13-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.55-1 (2023-09-29)
> Dez 06 19:06:18 Zwerg kernel: Linux version 6.1.0-13-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.55-1 (2023-09-29)
> Dez 09 15:12:13 Zwerg kernel: Linux version 6.1.0-14-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.64-1 (2023-11-30)
> Dez 19 07:33:02 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Dez 20 10:29:21 Zwerg kernel: Linux version 6.1.0-15-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.66-1 (2023-12-09)
> Jan 01 09:57:40 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Jan 02 13:41:33 Zwerg kernel: Linux version 6.1.0-15-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.66-1 (2023-12-09)
> Jan 10 16:15:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Jan 13 11:02:41 Zwerg kernel: Linux version 6.1.0-17-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.69-1 (2023-12-30)
> Jan 13 11:16:31 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Jan 13 11:18:13 Zwerg kernel: Linux version 6.1.0-17-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.69-1 (2023-12-30)
> Jan 19 14:25:08 Zwerg kernel: Linux version 6.1.0-1-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-13) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.39.90.20221231) #1 SMP PREEMPT_DYNAMIC 
> Debian 6.1.4-1 (2023-01-07)
> Jan 27 09:41:16 Zwerg kernel: Linux version 6.1.0-17-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.69-1 (2023-12-30)
> Jan 27 09:44:53 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Jan 27 09:48:05 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) 
> (uninitialized): PCIe link lost, device now detached
> Jan 27 09:52:16 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) 
> (uninitialized): PCIe link lost, device now detached
> Jan 27 09:58:46 Zwerg kernel: Linux version 6.1.0-1-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-13) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.39.90.20221231) #1 SMP PREEMPT_DYNAMIC 
> Debian 6.1.4-1 (2023-01-07)
> Feb 01 04:19:17 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Feb 01 14:43:03 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) 
> (uninitialized): PCIe link lost, device now detached
> Feb 01 14:50:04 Zwerg kernel: Linux version 6.1.0-17-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.69-1 (2023-12-30)
> Feb 01 15:28:42 Zwerg kernel: Linux version 6.5.0-0.deb12.4-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.5.10-1~bpo12+1 (2023-11-23)
> Feb 08 18:26:31 Zwerg kernel: Linux version 6.5.0-0.deb12.4-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.5.10-1~bpo12+1 (2023-11-23)
> Feb 08 18:33:38 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, 
> device now detached
> Feb 08 18:58:25 Zwerg kernel: Linux version 6.5.0-0.deb12.4-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.5.10-1~bpo12+1 (2023-11-23)
> Feb 08 19:00:32 Zwerg kernel: igc 0000:0b:00.0 eno1: PCIe link lost, 
> device now detached
> Feb 08 19:02:38 Zwerg kernel: igc 0000:0b:00.0 (unnamed net_device) 
> (uninitialized): PCIe link lost, device now detached
> Feb 08 19:05:30 Zwerg kernel: Linux version 6.5.0-0.deb12.4-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.5.10-1~bpo12+1 (2023-11-23)
> Feb 09 13:25:08 Zwerg kernel: igc 0000:0b:00.0 eno1: PCIe link lost, 
> device now detached
> Feb 09 13:27:17 Zwerg kernel: igc 0000:0b:00.0 (unnamed net_device) 
> (uninitialized): PCIe link lost, device now detached
> Feb 09 13:30:42 Zwerg kernel: Linux version 6.5.0-0.deb12.4-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.5.10-1~bpo12+1 (2023-11-23)
> Feb 11 18:47:57 Zwerg kernel: igc 0000:0b:00.0 eno1: PCIe link lost, 
> device now detached
> Feb 12 10:55:30 Zwerg kernel: Linux version 6.1.0-17-amd64 
> (debian-kernel@...ts.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 
> 6.1.69-1 (2023-12-30)
> 
> The kernel version I used were
> 
> Debian 6.1.4-1 (2023-01-07)
> Debian 6.1.38-4 (2023-08-08)
> Debian 6.1.52-1 (2023-09-07)
> Debian 6.1.55-1 (2023-09-29)
> Debian 6.1.64-1 (2023-11-30)
> Debian 6.1.66-1 (2023-12-09)
> Debian 6.1.69-1 (2023-12-30)
> Debian 6.5.10-1~bpo12+1 (2023-11-23)
> 
> 
> At this point, it looks like at least one person with a bit of insight 
> is convinced this is an upstream issue.
> 
> Of course I'll try to provide whatever information else may be needed.
> 
> Most importantly, I think, is the hardware surrounding the NIC:
> This is an ASUSTeK COMPUTER INC. ROG STRIX X670E-A GAMING WIFI, i.e. AMD 
> X670 chipset with fershly updated BIOS: 1904 01/29/2024. CPU is an AMD 
> Ryzen 9 7900X.
> 
> I have not set any particular overclocking or performance options, just 
> tried to have all firmware settings on "conservative".
> 
> 
> Mass storage is a Western Digital SN850X NVMe device.
> 
> I have experienced two cases where the storage device apparently 
> "vanished" from the PCIe bus, which resulted in a flood of journald 
> messages that it could not log anything to persistent storage. I have 
> never seen the first few lines of thos occurences, and obviously, I have 
> no logs.
> 
> I did notice, however, that the system still responded to pings on the 
> network.
> 
> All of this seems to indicate that this might be related to PCIe power 
> management. I suspect that my gut feeling is not the best starting point 
> to decide how to proceed here.
> 
> So, if you any way to improve this situation and make the system 
> reliably usable, I'm willing to help in any way I can, but you'll have 
> to tell me what to do!


[So Feb 25 13:36:36 2024] igc 0000:0b:00.0 eno1: PCIe link lost, device 
now detached
[So Feb 25 13:36:36 2024] ------------[ cut here ]------------
[So Feb 25 13:36:36 2024] igc: Failed to read reg 0xc030!
[So Feb 25 13:36:36 2024] WARNING: CPU: 20 PID: 1152218 at 
drivers/net/ethernet/intel/igc/igc_main.c:6583 igc_rd32+0x8d/0xa0 [igc]
[So Feb 25 13:36:36 2024] Modules linked in: cpuid isofs exfat rfcomm 
cpufreq_userspace cpufreq_powersave cpufreq_ondemand 
cpufreq_conservative nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 
dns_resolver nfs lockd grace fscache netfs qrtr overlay cmac algif_hash 
algif_skcipher af_alg bnep sunrpc binfmt_misc nls_ascii nls_cp437 vfat 
fat ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common btusb btrtl btbcm 
btintel mt7921e btmtk mt7921_common snd_hda_codec_hdmi bluetooth 
mt76_connac_lib edac_mce_amd mt76 snd_hda_intel kvm_amd snd_intel_dspcfg 
mac80211 snd_intel_sdw_acpi sha3_generic snd_usb_audio jitterentropy_rng 
snd_hda_codec uvcvideo kvm snd_usbmidi_lib videobuf2_vmalloc 
snd_hda_core uvc drbg videobuf2_memops snd_rawmidi libarc4 
videobuf2_v4l2 asus_nb_wmi snd_hwdep eeepc_wmi snd_seq_device asus_wmi 
ansi_cprng videodev irqbypass ecdh_generic battery snd_pcm ecc 
ledtrig_audio sparse_keymap cfg80211 crc16 rapl platform_profile 
videobuf2_common wmi_bmof sp5100_tco ccp snd_timer watchdog k10temp 
pcspkr mc snd soundcore rfkill joydev sg
[So Feb 25 13:36:36 2024]  evdev msr parport_pc ppdev lp parport fuse 
loop efi_pstore configfs efivarfs ip_tables x_tables autofs4 xfs 
libcrc32c crc32c_generic sd_mod dm_crypt dm_mod uas usb_storage 
hid_generic amdgpu amdxcp drm_buddy usbhid gpu_sched i2c_algo_bit hid 
drm_suballoc_helper sr_mod drm_display_helper cdrom cec crc32_pclmul 
rc_core crc32c_intel drm_ttm_helper ghash_clmulni_intel ttm ahci 
xhci_pci sha512_ssse3 xhci_hcd libahci sha512_generic nvme 
drm_kms_helper libata nvme_core t10_pi usbcore aesni_intel drm 
crc64_rocksoft_generic scsi_mod crypto_simd cryptd crc64_rocksoft igc 
crc_t10dif crct10dif_generic crct10dif_pclmul i2c_piix4 crc64 
crct10dif_common usb_common scsi_common video wmi gpio_amdpt 
gpio_generic button
[So Feb 25 13:36:36 2024] CPU: 20 PID: 1152218 Comm: kworker/20:1 Not 
tainted 6.5.0-0.deb12.4-amd64 #1  Debian 6.5.10-1~bpo12+1
[So Feb 25 13:36:36 2024] Hardware name: ASUS System Product Name/ROG 
STRIX X670E-A GAMING WIFI, BIOS 1904 01/29/2024
[So Feb 25 13:36:36 2024] Workqueue: events igc_watchdog_task [igc]
[So Feb 25 13:36:36 2024] RIP: 0010:igc_rd32+0x8d/0xa0 [igc]
[So Feb 25 13:36:36 2024] Code: 48 c7 c6 10 c6 37 c0 e8 81 1a 00 e8 48 
8b bb 28 ff ff ff e8 05 82 d6 e7 84 c0 74 bc 89 ee 48 c7 c7 38 c6 37 c0 
e8 c3 9e 75 e7 <0f> 0b eb aa b8 ff ff ff ff e9 15 7f 26 e8 0f 1f 44 00 
00 90 90 90
[So Feb 25 13:36:36 2024] RSP: 0018:ffffba4fe269fdd8 EFLAGS: 00010282
[So Feb 25 13:36:36 2024] RAX: 0000000000000000 RBX: ffff9cdc625c4cb8 
RCX: 0000000000000027
[So Feb 25 13:36:36 2024] RDX: ffff9ceb5e7213c8 RSI: 0000000000000001 
RDI: ffff9ceb5e7213c0
[So Feb 25 13:36:36 2024] RBP: 000000000000c030 R08: 0000000000000000 
R09: ffffba4fe269fc68
[So Feb 25 13:36:36 2024] R10: 0000000000000003 R11: ffff9ceb9de24d28 
R12: ffff9cdc625c4000
[So Feb 25 13:36:36 2024] R13: 0000000000000000 R14: ffff9cdc618b5d40 
R15: 000000000000c030
[So Feb 25 13:36:36 2024] FS:  0000000000000000(0000) 
GS:ffff9ceb5e700000(0000) knlGS:0000000000000000
[So Feb 25 13:36:36 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[So Feb 25 13:36:36 2024] CR2: 00007f030a057000 CR3: 000000047234a000 
CR4: 0000000000750ee0
[So Feb 25 13:36:36 2024] PKRU: 55555554
[So Feb 25 13:36:36 2024] Call Trace:
[So Feb 25 13:36:36 2024]  <TASK>
[So Feb 25 13:36:36 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[So Feb 25 13:36:36 2024]  ? __warn+0x81/0x130
[So Feb 25 13:36:36 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[So Feb 25 13:36:36 2024]  ? report_bug+0x171/0x1a0
[So Feb 25 13:36:36 2024]  ? srso_alias_return_thunk+0x5/0x7f
[So Feb 25 13:36:36 2024]  ? prb_read_valid+0x1b/0x30
[So Feb 25 13:36:36 2024]  ? handle_bug+0x41/0x70
[So Feb 25 13:36:36 2024]  ? exc_invalid_op+0x17/0x70
[So Feb 25 13:36:36 2024]  ? asm_exc_invalid_op+0x1a/0x20
[So Feb 25 13:36:36 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[So Feb 25 13:36:36 2024]  ? igc_rd32+0x8d/0xa0 [igc]
[So Feb 25 13:36:36 2024]  igc_update_stats+0x8a/0x6d0 [igc]
[So Feb 25 13:36:36 2024]  igc_watchdog_task+0x9d/0x4a0 [igc]
[So Feb 25 13:36:36 2024]  process_one_work+0x1df/0x3e0
[So Feb 25 13:36:36 2024]  worker_thread+0x51/0x390
[So Feb 25 13:36:36 2024]  ? __pfx_worker_thread+0x10/0x10
[So Feb 25 13:36:36 2024]  kthread+0xe5/0x120
[So Feb 25 13:36:36 2024]  ? __pfx_kthread+0x10/0x10
[So Feb 25 13:36:36 2024]  ret_from_fork+0x31/0x50
[So Feb 25 13:36:36 2024]  ? __pfx_kthread+0x10/0x10
[So Feb 25 13:36:36 2024]  ret_from_fork_asm+0x1b/0x30
[So Feb 25 13:36:36 2024]  </TASK>
[So Feb 25 13:36:36 2024] ---[ end trace 0000000000000000 ]---


Cheers,

Arno

-- 
Arno Lehmann

IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ