[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f6162b22-c6c6-47d7-9bda-dd702fcc5b4b@hixontech.com>
Date: Sun, 21 Jul 2024 03:13:03 -0600
From: Chris Hixon <linux-kernel-bugs@...ontech.com>
To: Basavaraj Natikar <bnatikar@....com>,
Linux regressions mailing list <regressions@...ts.linux.dev>,
Jiri Kosina <jkosina@...e.com>, Benjamin Tissoires <bentiss@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Li Ma <li.ma@....com>,
amd-gfx@...ts.freedesktop.org, Alexander.Deucher@....com,
yifan1.zhang@....com, linux-input@...r.kernel.org,
Basavaraj Natikar <basavaraj.natikar@....com>
Subject: Re: [BUG] HID: amd_sfh (drivers/hid/amd-sfh-hid/): memory/page
corruption
On 7/21/24 00:20, Basavaraj Natikar wrote:
> On 7/17/2024 4:51 PM, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 15.07.24 06:39, Chris Hixon wrote:
>>> System: HP ENVY x360 Convertible 15-ds1xxx; AMD Ryzen 7 4700U with
>>> Radeon Graphics
>>>
>>> Problem commits (introduced in v6.9-rc1):
>>> 6296562f30b1 HID: amd_sfh: Extend MP2 register access to SFH
>>> 2105e8e00da4 HID: amd_sfh: Improve boot time when SFH is available
>>>> It appears amd_sfh commits 6296562f30b1 and 2105e8e00da4 correlate with
>>> some form of memory/page corruption.
>> Hi! From a quick search on lore it looks like Basavaraj Natikar who
>> authored those two commits is inactive since a few days. This is totally
>> fine, but given the nature of the problem slightly unfortunate. That's
>> why I'm trying to raise awareness to this report by adding the
>> subsystems maintainers, a few lists, and a few people to the list of
>> recipients that were involved in the submission of those two patches.
>> With a bit of luck somebody might be able to help out. Ciao, Thorsten
>>
>>> On my system, this typically
>>> presents itself as a page dump followed by BTRFS errors, usually
>>> involving "corrupt leaf" (see dmesg output below); often the BTRFS
>>> filesystem becomes read-only afterwards. Note that the underlying NVME
>>> disk seems fine, and the BTRFS filesystem does not actually appear to be
>>> corrupt when booted/checked from kernels without this bug (no BTRFS
>>> errors or I/O errors reported on non-problem kernels).
>>>
>>> I have no problems when I blacklist the amd_sfh module (any kernel
>>> version), or revert both commits 6296562f30b1 and 2105e8e00da4 (on
>>> stable, linux-6.9.y). I have no problems on any recent linux-mainline
>>> (v6.10{,-rc*}) when reverting these two commits (in addition to
>>> reverting 7902ec988a9a and 6856f079cd45 to successfully build the
>>> kernel). I have had no problems with any 6.6.y, v6.7.y, or v6.8.y version.
>>>
>>> It is curious BTRFS always seems involved, but problems go away with
>>> these amd_sfh commits reverted (or amd_afh disabled).
>>>
>>> Further notes:
>>>
>>> I have not specifically used the amd_sfh module for anything. As far
>>> I've been able to determine, my system has the "Sensor Fusion Hub" mp2
>>> chip, but has no supported sensors/sub-devices (or I need to do
>>> something to enable them), (or there is an error while detecting
>>> sensors?). All logs I've checked contain something like:
>>>
>>> Jul 09 04:14:37 arch kernel: pcie_mp2_amd 0000:04:00.7: enabling device
>>> (0000 -> 0002)
>>> Jul 09 04:15:07 arch kernel: pcie_mp2_amd 0000:04:00.7: Failed to
>>> discover, sensors not enabled is 0
>>> Jul 09 04:15:07 arch kernel: pcie_mp2_amd 0000:04:00.7:
>>> amd_sfh_hid_client_init failed err -95
>>>
>>> Excerpt from lshw:
>>> *-generic:1 UNCLAIMED
>>> description: Signal processing controller
>>> product: Sensor Fusion Hub
>>> vendor: Advanced Micro Devices, Inc. [AMD]
>>> physical id: 0.7
>>> bus info: pci@...0:04:00.7
>>> version: 00
>>> width: 32 bits
>>> clock: 33MHz
>>> capabilities: pm pciexpress msi msix cap_list
>>> configuration: latency=0
>>> resources: memory:fe000000-fe0fffff
>>> memory:fe4cc000-fe4cdfff
>
> Could you please check with the latest version, including the patch below?
>
> https://lore.kernel.org/all/20240718111616.3012155-1-Basavaraj.Natikar@amd.com/
>
> Thanks,
> --
> Basavaraj
>
Hi,
Unfortunately, that patch doesn't fix the issue. I do get different crashes...
Test #1, patch applied to linux-mainline 720261cfc732:
------------------------------------------------------
Jul 19 03:07:52 arch kernel: pcie_mp2_amd 0000:04:00.7: enabling device (0000 -> 0002)
...
Jul 19 03:08:22 arch kernel: pcie_mp2_amd 0000:04:00.7: Failed to discover, sensors not enabled is 0
Jul 19 03:08:22 arch kernel: pcie_mp2_amd 0000:04:00.7: amd_sfh_hid_client_init failed err -95
...
Jul 19 03:08:33 arch kernel: journalctl: Corrupted page table at address 7f364f200098
Jul 19 03:08:33 arch kernel: PGD 12532c067 P4D 12532c067 PUD 1277e7067 PMD 110dd1067 PTE ffffdbdefffffd9f
Jul 19 03:08:33 arch kernel: Oops: Bad pagetable: 000d [#1] PREEMPT SMP NOPTI
Jul 19 03:08:33 arch kernel: CPU: 4 PID: 2366 Comm: journalctl Not tainted 6.10.0-08676-g720261cfc732-dirty #1 323ec4760c6f8b2d83974caae365d400b43e1fa0
Jul 19 03:08:33 arch kernel: Hardware name: HP HP ENVY x360 Convertible 15-ds1xxx/87A9, BIOS F.14 10/13/2023
Jul 19 03:08:33 arch kernel: RIP: 0033:0x7f365b0f22fa
Jul 19 03:08:33 arch kernel: Code: 45 c0 00 00 00 00 48 c7 45 b8 00 00 00 00 4d 85 e4 0f 84 81 06 00 00 4c 89 e6 48 89 df e8 ae 51 00 00 49 8b 94 24 d0 00 00 00 <48> 8b 82 98 00 00 00 45 3b 6c 24 1c 0f 84 9c 01 00 00 49 89 44 24
Jul 19 03:08:33 arch kernel: RSP: 002b:00007ffe3bed6020 EFLAGS: 00010206
Jul 19 03:08:33 arch kernel: RAX: 0000000000000000 RBX: 0000560c079673c0 RCX: 0000000000000001
Jul 19 03:08:33 arch kernel: RDX: 00007f364f200000 RSI: 0000560c0797f040 RDI: 0000560c079673c0
Jul 19 03:08:33 arch kernel: RBP: 00007ffe3bed60a0 R08: 0000560c07966010 R09: 0000000000000007
Jul 19 03:08:33 arch kernel: R10: 0000560c07968e40 R11: 0000560c07968ad0 R12: 0000560c0797f040
Jul 19 03:08:33 arch kernel: R13: 0000000000000001 R14: 0000560c07968858 R15: 0000000000000000
Jul 19 03:08:33 arch kernel: FS: 00007f365ab69880 GS: 0000000000000000
Jul 19 03:08:33 arch kernel: Modules linked in: nft_masq nft_chain_nat nf_nat bridge stp llc vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb ccm snd_seq_dummy snd_hrtimer rfcomm snd_seq snd_seq_device uhid cmac algif_hash algif_skcipher af_alg nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bnep snd_ctl_led iwlmvm mac80211 snd_acp3x_pdm_dma snd_acp3x_rn snd_soc_dmic snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_xtensa_dsp snd_sof_pci snd_sof snd_hda_codec_realtek intel_rapl_msr snd_sof_utils intel_rapl_common btusb uvcvideo snd_hda_codec_generic btrtl snd_soc_core libarc4 kvm_amd snd_hda_scodec_component btintel videobuf2_vmalloc btbcm snd_hda_codec_hdmi videobuf2_memops btmtk uvc iwlwifi snd_compress snd_hda_intel snd_pci_ps videobuf2_v4l2 snd_intel_dspcfg kvm snd_rpl_pci_acp6x bluetooth snd_hda_codec videodev snd_acp_pci cfg80211 snd_hwdep snd_hda_core snd_acp_legacy_common snd_pci_acp6x videobuf2_common mc snd_pcm rapl hp_wmi snd_pci_acp5x sparse_keymap pcspkr
Jul 19 03:08:33 arch kernel: snd_timer ucsi_acpi snd_rn_pci_acp3x wmi_bmof platform_profile acpi_cpufreq snd typec_ucsi snd_acp_config amd_sfh rfkill snd_soc_acpi sp5100_tco k10temp roles soundcore snd_pci_acp3x i2c_piix4 typec hp_accel wireless_hotkey lis3lv02d acpi_tad amd_pmc joydev mousedev mac_hid nls_iso8859_1 vfat fat usbip_host usbip_core crypto_user loop nfnetlink zram ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod amdgpu amdxcp i2c_algo_bit crct10dif_pclmul crc32_pclmul drm_ttm_helper crc32c_intel ttm polyval_clmulni polyval_generic rtsx_pci_sdmmc gf128mul mmc_core ghash_clmulni_intel drm_exec sha512_ssse3 nvme gpu_sched serio_raw sha256_ssse3 atkbd sha1_ssse3 nvme_core drm_suballoc_helper libps2 aesni_intel drm_buddy vivaldi_fmap drm_display_helper crypto_simd ccp cryptd rtsx_pci video xhci_pci i8042 nvme_auth xhci_pci_renesas crc16 wmi serio 9pnet_virtio 9p 9pnet netfs virtio_net net_failover failover virtio_blk hid_multitouch
Jul 19 03:08:33 arch kernel: i2c_hid_acpi i2c_hid
Jul 19 03:08:33 arch kernel: ---[ end trace 0000000000000000 ]---
Jul 19 03:08:33 arch kernel: RIP: 0033:0x7f365b0f22fa
Jul 19 03:08:33 arch kernel: RSP: 002b:00007ffe3bed6020 EFLAGS: 00010206
Jul 19 03:08:33 arch kernel: RAX: 0000000000000000 RBX: 0000560c079673c0 RCX: 0000000000000001
Jul 19 03:08:33 arch kernel: RDX: 00007f364f200000 RSI: 0000560c0797f040 RDI: 0000560c079673c0
Jul 19 03:08:33 arch kernel: RBP: 00007ffe3bed60a0 R08: 0000560c07966010 R09: 0000000000000007
Jul 19 03:08:33 arch kernel: R10: 0000560c07968e40 R11: 0000560c07968ad0 R12: 0000560c0797f040
Jul 19 03:08:33 arch kernel: R13: 0000000000000001 R14: 0000560c07968858 R15: 0000000000000000
Jul 19 03:08:33 arch kernel: FS: 00007f365ab69880(0000) GS:ffff9641bf600000(0000) knlGS:0000000000000000
Jul 19 03:08:33 arch kernel: CS: 0033 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 19 03:08:33 arch kernel: CR2: 00007f364f200098 CR3: 00000001224c2000 CR4: 0000000000350ef0
Jul 19 03:08:33 arch kernel: note: journalctl[2366] exited with irqs disabled
Jul 19 03:08:33 arch kernel: BUG: Bad page map in process journalctl pte:ffffdbdefffffd9f pmd:110dd1067
Jul 19 03:08:33 arch kernel: addr:00007f364f200000 vm_flags:080000d1 anon_vma:0000000000000000 mapping:ffff963ec4ef7be8 index:0
Jul 19 03:08:33 arch kernel: file:system@...ed14e1df74095b272f6fbd1a5cf90-000000000063dc49-000615f60b98c78b.journal fault:filemap_fault mmap:btrfs_file_mmap [btrfs] read_folio:btrfs_read_folio [btrfs]
Jul 19 03:08:33 arch kernel: CPU: 4 PID: 2366 Comm: journalctl Tainted: G D 6.10.0-08676-g720261cfc732-dirty #1 323ec4760c6f8b2d83974caae365d400b43e1fa0
Jul 19 03:08:33 arch kernel: Hardware name: HP HP ENVY x360 Convertible 15-ds1xxx/87A9, BIOS F.14 10/13/2023
Jul 19 03:08:33 arch kernel: Call Trace:
Jul 19 03:08:33 arch kernel: <TASK>
Jul 19 03:08:33 arch kernel: dump_stack_lvl+0x5d/0x80
Jul 19 03:08:33 arch kernel: print_bad_pte.cold+0x76/0xd9
Jul 19 03:08:33 arch kernel: vm_normal_page+0xd6/0xf0
Jul 19 03:08:33 arch kernel: unmap_page_range+0x5d7/0x17f0
Jul 19 03:08:33 arch kernel: unmap_vmas+0xbd/0x1a0
Jul 19 03:08:33 arch kernel: exit_mmap+0xf6/0x3b0
Jul 19 03:08:33 arch kernel: __mmput+0x3e/0x130
Jul 19 03:08:33 arch kernel: do_exit+0x2ce/0xad0
Jul 19 03:08:33 arch kernel: make_task_dead+0x90/0x90
Jul 19 03:08:33 arch kernel: rewind_stack_and_make_dead+0x16/0x20
Jul 19 03:08:33 arch kernel: RIP: 0033:0x7f365b0f22fa
Jul 19 03:08:33 arch kernel: Code: Unable to access opcode bytes at 0x7f365b0f22d0.
Jul 19 03:08:33 arch kernel: RSP: 002b:00007ffe3bed6020 EFLAGS: 00010206
Jul 19 03:08:33 arch kernel: RAX: 0000000000000000 RBX: 0000560c079673c0 RCX: 0000000000000001
Jul 19 03:08:33 arch kernel: RDX: 00007f364f200000 RSI: 0000560c0797f040 RDI: 0000560c079673c0
Jul 19 03:08:33 arch kernel: RBP: 00007ffe3bed60a0 R08: 0000560c07966010 R09: 0000000000000007
Jul 19 03:08:33 arch kernel: R10: 0000560c07968e40 R11: 0000560c07968ad0 R12: 0000560c0797f040
Jul 19 03:08:33 arch kernel: R13: 0000000000000001 R14: 0000560c07968858 R15: 0000000000000000
Jul 19 03:08:33 arch kernel: </TASK>
Jul 19 03:08:33 arch kernel: BUG: Bad page map in process journalctl pte:200000e1d pmd:110dd1067
Jul 19 03:08:33 arch kernel: addr:00007f364f201000 vm_flags:080000d1 anon_vma:0000000000000000 mapping:ffff963ec4ef7be8 index:1
Jul 19 03:08:33 arch kernel: file:system@...ed14e1df74095b272f6fbd1a5cf90-000000000063dc49-000615f60b98c78b.journal fault:filemap_fault mmap:btrfs_file_mmap [btrfs] read_folio:btrfs_read_folio [btrfs]
Jul 19 03:08:33 arch kernel: CPU: 4 PID: 2366 Comm: journalctl Tainted: G B D 6.10.0-08676-g720261cfc732-dirty #1 323ec4760c6f8b2d83974caae365d400b43e1fa0
Jul 19 03:08:33 arch kernel: Hardware name: HP HP ENVY x360 Convertible 15-ds1xxx/87A9, BIOS F.14 10/13/2023
Jul 19 03:08:33 arch kernel: Call Trace:
Jul 19 03:08:33 arch kernel: <TASK>
Jul 19 03:08:33 arch kernel: dump_stack_lvl+0x5d/0x80
Jul 19 03:08:33 arch kernel: print_bad_pte.cold+0x76/0xd9
Jul 19 03:08:33 arch kernel: vm_normal_page+0xd6/0xf0
Jul 19 03:08:33 arch kernel: unmap_page_range+0x5d7/0x17f0
Jul 19 03:08:33 arch kernel: unmap_vmas+0xbd/0x1a0
Jul 19 03:08:33 arch kernel: exit_mmap+0xf6/0x3b0
Jul 19 03:08:33 arch kernel: __mmput+0x3e/0x130
Jul 19 03:08:33 arch kernel: do_exit+0x2ce/0xad0
Jul 19 03:08:33 arch kernel: make_task_dead+0x90/0x90
Jul 19 03:08:33 arch kernel: rewind_stack_and_make_dead+0x16/0x20
Jul 19 03:08:33 arch kernel: RIP: 0033:0x7f365b0f22fa
Jul 19 03:08:33 arch kernel: Code: Unable to access opcode bytes at 0x7f365b0f22d0.
Jul 19 03:08:33 arch kernel: RSP: 002b:00007ffe3bed6020 EFLAGS: 00010206
Jul 19 03:08:33 arch kernel: RAX: 0000000000000000 RBX: 0000560c079673c0 RCX: 0000000000000001
Jul 19 03:08:33 arch kernel: RDX: 00007f364f200000 RSI: 0000560c0797f040 RDI: 0000560c079673c0
Jul 19 03:08:33 arch kernel: RBP: 00007ffe3bed60a0 R08: 0000560c07966010 R09: 0000000000000007
Jul 19 03:08:33 arch kernel: R10: 0000560c07968e40 R11: 0000560c07968ad0 R12: 0000560c0797f040
Jul 19 03:08:33 arch kernel: R13: 0000000000000001 R14: 0000560c07968858 R15: 0000000000000000
Jul 19 03:08:33 arch kernel: </TASK>
Jul 19 03:08:33 arch kernel: BUG: Bad page map in process journalctl pte:30ffffff8f pmd:120674067
Jul 19 03:08:33 arch kernel: addr:00007f364f600000 vm_flags:080000d1 anon_vma:0000000000000000 mapping:ffff963ec4ef7be8 index:400
Jul 19 03:08:33 arch kernel: file:system@...ed14e1df74095b272f6fbd1a5cf90-000000000063dc49-000615f60b98c78b.journal fault:filemap_fault mmap:btrfs_file_mmap [btrfs] read_folio:btrfs_read_folio [btrfs]
Jul 19 03:08:33 arch kernel: CPU: 4 PID: 2366 Comm: journalctl Tainted: G B D 6.10.0-08676-g720261cfc732-dirty #1 323ec4760c6f8b2d83974caae365d400b43e1fa0
Jul 19 03:08:33 arch kernel: Hardware name: HP HP ENVY x360 Convertible 15-ds1xxx/87A9, BIOS F.14 10/13/2023
Jul 19 03:08:33 arch kernel: Call Trace:
Jul 19 03:08:33 arch kernel: <TASK>
Jul 19 03:08:33 arch kernel: dump_stack_lvl+0x5d/0x80
Jul 19 03:08:33 arch kernel: print_bad_pte.cold+0x76/0xd9
Jul 19 03:08:33 arch kernel: vm_normal_page+0xd6/0xf0
Jul 19 03:08:33 arch kernel: unmap_page_range+0x5d7/0x17f0
Jul 19 03:08:33 arch kernel: unmap_vmas+0xbd/0x1a0
Jul 19 03:08:33 arch kernel: exit_mmap+0xf6/0x3b0
Jul 19 03:08:33 arch kernel: __mmput+0x3e/0x130
Jul 19 03:08:33 arch kernel: do_exit+0x2ce/0xad0
Jul 19 03:08:33 arch kernel: make_task_dead+0x90/0x90
Jul 19 03:08:33 arch kernel: rewind_stack_and_make_dead+0x16/0x20
Jul 19 03:08:33 arch kernel: RIP: 0033:0x7f365b0f22fa
Jul 19 03:08:33 arch kernel: Code: Unable to access opcode bytes at 0x7f365b0f22d0.
Jul 19 03:08:33 arch kernel: RSP: 002b:00007ffe3bed6020 EFLAGS: 00010206
Jul 19 03:08:33 arch kernel: RAX: 0000000000000000 RBX: 0000560c079673c0 RCX: 0000000000000001
Jul 19 03:08:33 arch kernel: RDX: 00007f364f200000 RSI: 0000560c0797f040 RDI: 0000560c079673c0
Jul 19 03:08:33 arch kernel: RBP: 00007ffe3bed60a0 R08: 0000560c07966010 R09: 0000000000000007
Jul 19 03:08:33 arch kernel: R10: 0000560c07968e40 R11: 0000560c07968ad0 R12: 0000560c0797f040
Jul 19 03:08:33 arch kernel: R13: 0000000000000001 R14: 0000560c07968858 R15: 0000000000000000
Jul 19 03:08:33 arch kernel: </TASK>
Jul 19 03:08:33 arch kernel: BUG: Bad page map in process journalctl pte:ffffffe6 pmd:120674067
Jul 19 03:08:33 arch kernel: addr:00007f364f601000 vm_flags:080000d1 anon_vma:0000000000000000 mapping:ffff963ec4ef7be8 index:401
Jul 19 03:08:33 arch kernel: file:system@...ed14e1df74095b272f6fbd1a5cf90-000000000063dc49-000615f60b98c78b.journal fault:filemap_fault mmap:btrfs_file_mmap [btrfs] read_folio:btrfs_read_folio [btrfs]
Jul 19 03:08:33 arch kernel: CPU: 4 PID: 2366 Comm: journalctl Tainted: G B D 6.10.0-08676-g720261cfc732-dirty #1 323ec4760c6f8b2d83974caae365d400b43e1fa0
Jul 19 03:08:33 arch kernel: Hardware name: HP HP ENVY x360 Convertible 15-ds1xxx/87A9, BIOS F.14 10/13/2023
Jul 19 03:08:33 arch kernel: Call Trace:
Jul 19 03:08:33 arch kernel: <TASK>
Jul 19 03:08:33 arch kernel: dump_stack_lvl+0x5d/0x80
Jul 19 03:08:33 arch kernel: print_bad_pte.cold+0x76/0xd9
Jul 19 03:08:33 arch kernel: vm_normal_page+0xd6/0xf0
Jul 19 03:08:33 arch kernel: unmap_page_range+0x5d7/0x17f0
Jul 19 03:08:33 arch kernel: unmap_vmas+0xbd/0x1a0
Jul 19 03:08:33 arch kernel: exit_mmap+0xf6/0x3b0
Jul 19 03:08:33 arch kernel: __mmput+0x3e/0x130
Jul 19 03:08:33 arch kernel: do_exit+0x2ce/0xad0
Jul 19 03:08:33 arch kernel: make_task_dead+0x90/0x90
Jul 19 03:08:33 arch kernel: rewind_stack_and_make_dead+0x16/0x20
Jul 19 03:08:33 arch kernel: RIP: 0033:0x7f365b0f22fa
Jul 19 03:08:33 arch kernel: Code: Unable to access opcode bytes at 0x7f365b0f22d0.
Jul 19 03:08:33 arch kernel: RSP: 002b:00007ffe3bed6020 EFLAGS: 00010206
Jul 19 03:08:33 arch kernel: RAX: 0000000000000000 RBX: 0000560c079673c0 RCX: 0000000000000001
Jul 19 03:08:33 arch kernel: RDX: 00007f364f200000 RSI: 0000560c0797f040 RDI: 0000560c079673c0
Jul 19 03:08:33 arch kernel: RBP: 00007ffe3bed60a0 R08: 0000560c07966010 R09: 0000000000000007
Jul 19 03:08:33 arch kernel: R10: 0000560c07968e40 R11: 0000560c07968ad0 R12: 0000560c0797f040
Jul 19 03:08:33 arch kernel: R13: 0000000000000001 R14: 0000560c07968858 R15: 0000000000000000
Jul 19 03:08:33 arch kernel: </TASK>
Jul 19 03:08:33 arch kernel: BUG: Bad rss-counter state mm:0000000043d5429f type:MM_FILEPAGES val:1
Test #2, patch applied to linux-mainline 2c9b3512402e:
------------------------------------------------------
Jul 21 02:45:19 arch kernel: pcie_mp2_amd 0000:04:00.7: enabling device (0000 -> 0002)
...
Jul 21 02:45:49 arch kernel: pcie_mp2_amd 0000:04:00.7: Failed to discover, sensors not enabled is 0
Jul 21 02:45:49 arch kernel: pcie_mp2_amd 0000:04:00.7: amd_sfh_hid_client_init failed err -95
Jul 21 02:45:50 arch kernel: BUG: unable to handle page fault for address: 000000b4ffffed39
Jul 21 02:45:50 arch kernel: #PF: supervisor read access in kernel mode
Jul 21 02:45:50 arch kernel: #PF: error_code(0x0000) - not-present page
Jul 21 02:45:50 arch kernel: PGD 0 P4D 0
Jul 21 02:45:50 arch kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jul 21 02:45:50 arch kernel: CPU: 2 PID: 1648 Comm: gnome-shell Not tainted 6.10.0-11185-g2c9b3512402e-dirty #1 909b642174274273e9a5ff42844d49a454a06a9d
Jul 21 02:45:50 arch kernel: Hardware name: HP HP ENVY x360 Convertible 15-ds1xxx/87A9, BIOS F.14 10/13/2023
Jul 21 02:45:50 arch kernel: RIP: 0010:poll_freewait+0x58/0xa0
Jul 21 02:45:50 arch kernel: Code: 83 c3 40 e8 9a c4 ce ff 48 8b 7b b0 e8 21 33 fe ff 41 39 6d 28 7f de 4d 85 e4 74 40 49 8b 5c 24 08 49 8d 6c 24 10 48 83 eb 40 <48> 8b 7b 38 48 8d 73 10 e8 6b c4 ce ff 48 8b 3b e8 f3 32 fe ff 48
Jul 21 02:45:50 arch kernel: RSP: 0018:ffffafd196e578e8 EFLAGS: 00010202
Jul 21 02:45:50 arch kernel: RAX: dead000000000122 RBX: 000000b4ffffed01 RCX: 0000000000000001
Jul 21 02:45:50 arch kernel: RDX: ffffa09a0bc10590 RSI: 0000000000000292 RDI: ffffa09a00e95700
Jul 21 02:45:50 arch kernel: RBP: ffffa09a09197010 R08: 0000000000000001 R09: 0000000000000001
Jul 21 02:45:50 arch kernel: R10: 0000000000000002 R11: 0000000000000001 R12: ffffa09a09197000
Jul 21 02:45:50 arch kernel: R13: ffffafd196e57a70 R14: 0000000000000011 R15: 0000000000000001
Jul 21 02:45:50 arch kernel: FS: 00007f90be94ed80(0000) GS:ffffa09cff700000(0000) knlGS:0000000000000000
Jul 21 02:45:50 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 21 02:45:50 arch kernel: CR2: 000000b4ffffed39 CR3: 000000010dfbc000 CR4: 0000000000350ef0
Jul 21 02:45:50 arch kernel: Call Trace:
Jul 21 02:45:50 arch kernel: <TASK>
Jul 21 02:45:50 arch kernel: ? __die_body.cold+0x19/0x27
Jul 21 02:45:50 arch kernel: ? page_fault_oops+0x15a/0x2d0
Jul 21 02:45:50 arch kernel: ? exc_page_fault+0x81/0x190
Jul 21 02:45:50 arch kernel: ? asm_exc_page_fault+0x26/0x30
Jul 21 02:45:50 arch kernel: ? poll_freewait+0x58/0xa0
Jul 21 02:45:50 arch kernel: ? poll_freewait+0x3f/0xa0
Jul 21 02:45:50 arch kernel: do_sys_poll+0x4e4/0x600
Jul 21 02:45:50 arch kernel: ? __pfx_pollwake+0x10/0x10
Jul 21 02:45:50 arch kernel: ? __pfx_pollwake+0x10/0x10
Jul 21 02:45:50 arch kernel: ? __pfx_pollwake+0x10/0x10
Jul 21 02:45:50 arch kernel: ? __pfx_pollwake+0x10/0x10
Jul 21 02:45:50 arch kernel: ? __pfx_pollwake+0x10/0x10
Jul 21 02:45:50 arch kernel: ? __pfx_pollwake+0x10/0x10
Jul 21 02:45:50 arch kernel: ? __pfx_pollwake+0x10/0x10
Jul 21 02:45:50 arch kernel: ? __pfx_pollwake+0x10/0x10
Jul 21 02:45:50 arch kernel: ? __pfx_pollwake+0x10/0x10
Jul 21 02:45:50 arch kernel: ? do_syscall_64+0x8e/0x190
Jul 21 02:45:50 arch kernel: __x64_sys_poll+0xd0/0x180
Jul 21 02:45:50 arch kernel: do_syscall_64+0x82/0x190
Jul 21 02:45:50 arch kernel: ? __pfx_read_tsc+0x10/0x10
Jul 21 02:45:50 arch kernel: ? __pm_runtime_suspend+0x69/0xc0
Jul 21 02:45:50 arch kernel: ? amdgpu_drm_ioctl+0x71/0x90 [amdgpu 706919ea96bf2f5c4d15446a617826d78535f0ff]
Jul 21 02:45:50 arch kernel: ? syscall_exit_to_user_mode+0x10/0x200
Jul 21 02:45:50 arch kernel: ? do_syscall_64+0x8e/0x190
Jul 21 02:45:50 arch kernel: ? syscall_exit_to_user_mode+0x10/0x200
Jul 21 02:45:50 arch kernel: ? do_syscall_64+0x8e/0x190
Jul 21 02:45:50 arch kernel: ? syscall_exit_to_user_mode+0x10/0x200
Jul 21 02:45:50 arch kernel: ? do_syscall_64+0x8e/0x190
Jul 21 02:45:50 arch kernel: ? __irq_exit_rcu+0x4a/0xb0
Jul 21 02:45:50 arch kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 21 02:45:50 arch kernel: RIP: 0033:0x7f90c2f1c39d
Jul 21 02:45:50 arch kernel: Code: e5 48 83 ec 20 89 55 ec 48 89 75 f0 48 89 7d f8 e8 08 6a f8 ff 8b 55 ec 48 8b 75 f0 41 89 c0 48 8b 7d f8 b8 07 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 44 89 c7 89 45 f8 e8 60 6a f8 ff 8b 45 f8
Jul 21 02:45:50 arch kernel: RSP: 002b:00007fffaf5d0530 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
Jul 21 02:45:50 arch kernel: RAX: ffffffffffffffda RBX: 000000007fffffff RCX: 00007f90c2f1c39d
Jul 21 02:45:50 arch kernel: RDX: 0000000000000334 RSI: 0000000000000011 RDI: 000055a4c9873340
Jul 21 02:45:50 arch kernel: RBP: 00007fffaf5d0550 R08: 0000000000000000 R09: 000000007fffffff
Jul 21 02:45:50 arch kernel: R10: 000055a4c6b95c80 R11: 0000000000000293 R12: 0000000000000334
Jul 21 02:45:50 arch kernel: R13: 0000000000000011 R14: 000055a4c9873340 R15: 000055a4c6b95c80
Jul 21 02:45:50 arch kernel: </TASK>
Jul 21 02:45:50 arch kernel: Modules linked in: nft_masq nft_chain_nat nf_nat bridge stp llc vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb ccm snd_seq_dummy snd_hrtimer rfcomm snd_seq snd_seq_device uhid cmac algif_hash algif_skcipher af_alg snd_ctl_led nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bnep iwlmvm mac80211 snd_acp3x_pdm_dma snd_acp3x_rn snd_soc_dmic snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_xtensa_dsp snd_sof_pci snd_sof intel_rapl_msr intel_rapl_common uvcvideo btusb kvm_amd libarc4 snd_hda_codec_realtek btrtl snd_sof_utils videobuf2_vmalloc btintel snd_hda_codec_generic videobuf2_memops uvc snd_soc_core iwlwifi snd_hda_scodec_component snd_hda_codec_hdmi btbcm videobuf2_v4l2 snd_compress btmtk kvm snd_hda_intel snd_pci_ps videodev snd_rpl_pci_acp6x snd_intel_dspcfg snd_acp_pci bluetooth snd_hda_codec videobuf2_common snd_acp_legacy_common cfg80211 snd_hwdep mc snd_pci_acp6x snd_hda_core snd_pci_acp5x rapl snd_pcm pcspkr snd_rn_pci_acp3x
Jul 21 02:45:50 arch kernel: hp_wmi snd_timer snd_acp_config sparse_keymap ucsi_acpi sp5100_tco snd_soc_acpi platform_profile wmi_bmof acpi_cpufreq typec_ucsi snd amd_sfh roles soundcore snd_pci_acp3x rfkill typec i2c_piix4 k10temp hp_accel lis3lv02d amd_pmc wireless_hotkey acpi_tad joydev mousedev mac_hid nls_iso8859_1 vfat fat usbip_host usbip_core crypto_user loop nfnetlink zram ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod amdgpu crct10dif_pclmul crc32_pclmul amdxcp crc32c_intel i2c_algo_bit polyval_clmulni drm_ttm_helper rtsx_pci_sdmmc polyval_generic ttm mmc_core ghash_clmulni_intel nvme drm_exec sha512_ssse3 gpu_sched serio_raw sha256_ssse3 nvme_core atkbd sha1_ssse3 drm_suballoc_helper libps2 aesni_intel drm_buddy ccp gf128mul vivaldi_fmap drm_display_helper crypto_simd rtsx_pci cryptd xhci_pci nvme_auth video i8042 xhci_pci_renesas crc16 serio wmi 9pnet_virtio 9p 9pnet netfs virtio_net net_failover failover virtio_blk hid_multitouch
Jul 21 02:45:50 arch kernel: i2c_hid_acpi i2c_hid
Jul 21 02:45:50 arch kernel: CR2: 000000b4ffffed39
Jul 21 02:45:50 arch kernel: ---[ end trace 0000000000000000 ]---
Jul 21 02:45:50 arch kernel: RIP: 0010:poll_freewait+0x58/0xa0
Jul 21 02:45:50 arch kernel: Code: 83 c3 40 e8 9a c4 ce ff 48 8b 7b b0 e8 21 33 fe ff 41 39 6d 28 7f de 4d 85 e4 74 40 49 8b 5c 24 08 49 8d 6c 24 10 48 83 eb 40 <48> 8b 7b 38 48 8d 73 10 e8 6b c4 ce ff 48 8b 3b e8 f3 32 fe ff 48
Jul 21 02:45:50 arch kernel: RSP: 0018:ffffafd196e578e8 EFLAGS: 00010202
Jul 21 02:45:50 arch kernel: RAX: dead000000000122 RBX: 000000b4ffffed01 RCX: 0000000000000001
Jul 21 02:45:50 arch kernel: RDX: ffffa09a0bc10590 RSI: 0000000000000292 RDI: ffffa09a00e95700
Jul 21 02:45:50 arch kernel: RBP: ffffa09a09197010 R08: 0000000000000001 R09: 0000000000000001
Jul 21 02:45:50 arch kernel: R10: 0000000000000002 R11: 0000000000000001 R12: ffffa09a09197000
Jul 21 02:45:50 arch kernel: R13: ffffafd196e57a70 R14: 0000000000000011 R15: 0000000000000001
Jul 21 02:45:50 arch kernel: FS: 00007f90be94ed80(0000) GS:ffffa09cff700000(0000) knlGS:0000000000000000
Jul 21 02:45:50 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 21 02:45:50 arch kernel: CR2: 000000b4ffffed39 CR3: 000000010dfbc000 CR4: 0000000000350ef0
Jul 21 02:45:50 arch kernel: note: gnome-shell[1648] exited with irqs disabled
Jul 21 02:45:50 arch systemd-timesyncd[714]: Contacted time server 71.123.46.185:123 (2.arch.pool.ntp.org).
Jul 21 02:45:50 arch systemd-timesyncd[714]: Initial clock synchronization to Sun 2024-07-21 02:45:50.385189 MDT.
Jul 21 02:45:54 arch kernel: BUG: unable to handle page fault for address: ffffafd196e57a88
Jul 21 02:45:54 arch kernel: #PF: supervisor read access in kernel mode
Jul 21 02:45:54 arch kernel: #PF: error_code(0x0000) - not-present page
Jul 21 02:45:54 arch kernel: PGD 100000067 P4D 100000067 PUD 10020d067 PMD 126169067 PTE 0
Jul 21 02:45:54 arch kernel: Oops: Oops: 0000 [#2] PREEMPT SMP NOPTI
Jul 21 02:45:54 arch kernel: CPU: 1 PID: 1656 Comm: gdbus Tainted: G D 6.10.0-11185-g2c9b3512402e-dirty #1 909b642174274273e9a5ff42844d49a454a06a9d
Jul 21 02:45:54 arch kernel: Hardware name: HP HP ENVY x360 Convertible 15-ds1xxx/87A9, BIOS F.14 10/13/2023
Jul 21 02:45:54 arch kernel: RIP: 0010:pollwake+0x57/0xa0
Jul 21 02:45:54 arch kernel: Code: 04 24 00 00 00 00 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10 00 00 00 00 48 c7 44 24 18 00 00 00 00 48 c7 44 24 20 00 00 00 00 <48> 8b 78 18 48 c7 44 24 10 80 89 90 94 48 89 7c 24 08 c7 40 20 01
Jul 21 02:45:54 arch kernel: RSP: 0018:ffffafd196f6fd58 EFLAGS: 00010002
Jul 21 02:45:54 arch kernel: RAX: ffffafd196e57a70 RBX: ffffa09a443f7890 RCX: 0000000000000001
Jul 21 02:45:54 arch kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffa09a09197060
Jul 21 02:45:54 arch kernel: RBP: 0000000000000000 R08: ffffa09a443f7890 R09: 0000000000000000
Jul 21 02:45:54 arch kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa09a443f7878
Jul 21 02:45:54 arch kernel: R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000001
Jul 21 02:45:54 arch kernel: FS: 00007f90b9e006c0(0000) GS:ffffa09cff680000(0000) knlGS:0000000000000000
Jul 21 02:45:54 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 21 02:45:54 arch kernel: CR2: ffffafd196e57a88 CR3: 000000010dfbc000 CR4: 0000000000350ef0
Jul 21 02:45:54 arch kernel: Call Trace:
Jul 21 02:45:54 arch kernel: <TASK>
Jul 21 02:45:54 arch kernel: ? __die_body.cold+0x19/0x27
Jul 21 02:45:54 arch kernel: ? page_fault_oops+0x15a/0x2d0
Jul 21 02:45:54 arch kernel: ? search_bpf_extables+0x5f/0x80
Jul 21 02:45:54 arch kernel: ? exc_page_fault+0x18a/0x190
Jul 21 02:45:54 arch kernel: ? asm_exc_page_fault+0x26/0x30
Jul 21 02:45:54 arch kernel: ? pollwake+0x57/0xa0
Jul 21 02:45:54 arch kernel: ? __pfx_bpf_lsm_file_permission+0x10/0x10
Jul 21 02:45:54 arch kernel: __wake_up_common+0x78/0xa0
Jul 21 02:45:54 arch kernel: eventfd_write+0xc9/0x1f0
Jul 21 02:45:54 arch kernel: ? __pfx_bpf_lsm_file_permission+0x10/0x10
Jul 21 02:45:54 arch kernel: ? security_file_permission+0x36/0x50
Jul 21 02:45:54 arch kernel: vfs_write+0xf8/0x460
Jul 21 02:45:54 arch kernel: ? syscall_exit_to_user_mode+0x10/0x200
Jul 21 02:45:54 arch kernel: ? do_syscall_64+0x8e/0x190
Jul 21 02:45:54 arch kernel: ? do_syscall_64+0x8e/0x190
Jul 21 02:45:54 arch kernel: ksys_write+0x6d/0xf0
Jul 21 02:45:54 arch kernel: do_syscall_64+0x82/0x190
Jul 21 02:45:54 arch kernel: ? do_syscall_64+0x8e/0x190
Jul 21 02:45:54 arch kernel: ? exc_page_fault+0x81/0x190
Jul 21 02:45:54 arch kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 21 02:45:54 arch kernel: RIP: 0033:0x7f90c2f1d53d
Jul 21 02:45:54 arch kernel: Code: e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 68 58 f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 45 f8 e8 bf 58 f8 ff 48 8b
Jul 21 02:45:54 arch kernel: RSP: 002b:00007f90b9dfe3d0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Jul 21 02:45:54 arch kernel: RAX: ffffffffffffffda RBX: 000055a4ca48d790 RCX: 00007f90c2f1d53d
Jul 21 02:45:54 arch kernel: RDX: 0000000000000008 RSI: 00007f90b9dfe400 RDI: 000000000000003b
Jul 21 02:45:54 arch kernel: RBP: 00007f90b9dfe3f0 R08: 0000000000000000 R09: 0000000000000001
Jul 21 02:45:54 arch kernel: R10: 00007f8ff400dda0 R11: 0000000000000293 R12: 00007f90b9dfe400
Jul 21 02:45:54 arch kernel: R13: 000055a4c6baff70 R14: 0000000000000001 R15: 00007f90a0046b20
Jul 21 02:45:54 arch kernel: </TASK>
Jul 21 02:45:54 arch kernel: Modules linked in: nft_masq nft_chain_nat nf_nat bridge stp llc vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb ccm snd_seq_dummy snd_hrtimer rfcomm snd_seq snd_seq_device uhid cmac algif_hash algif_skcipher af_alg snd_ctl_led nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bnep iwlmvm mac80211 snd_acp3x_pdm_dma snd_acp3x_rn snd_soc_dmic snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_xtensa_dsp snd_sof_pci snd_sof intel_rapl_msr intel_rapl_common uvcvideo btusb kvm_amd libarc4 snd_hda_codec_realtek btrtl snd_sof_utils videobuf2_vmalloc btintel snd_hda_codec_generic videobuf2_memops uvc snd_soc_core iwlwifi snd_hda_scodec_component snd_hda_codec_hdmi btbcm videobuf2_v4l2 snd_compress btmtk kvm snd_hda_intel snd_pci_ps videodev snd_rpl_pci_acp6x snd_intel_dspcfg snd_acp_pci bluetooth snd_hda_codec videobuf2_common snd_acp_legacy_common cfg80211 snd_hwdep mc snd_pci_acp6x snd_hda_core snd_pci_acp5x rapl snd_pcm pcspkr snd_rn_pci_acp3x
Jul 21 02:45:54 arch kernel: hp_wmi snd_timer snd_acp_config sparse_keymap ucsi_acpi sp5100_tco snd_soc_acpi platform_profile wmi_bmof acpi_cpufreq typec_ucsi snd amd_sfh roles soundcore snd_pci_acp3x rfkill typec i2c_piix4 k10temp hp_accel lis3lv02d amd_pmc wireless_hotkey acpi_tad joydev mousedev mac_hid nls_iso8859_1 vfat fat usbip_host usbip_core crypto_user loop nfnetlink zram ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod amdgpu crct10dif_pclmul crc32_pclmul amdxcp crc32c_intel i2c_algo_bit polyval_clmulni drm_ttm_helper rtsx_pci_sdmmc polyval_generic ttm mmc_core ghash_clmulni_intel nvme drm_exec sha512_ssse3 gpu_sched serio_raw sha256_ssse3 nvme_core atkbd sha1_ssse3 drm_suballoc_helper libps2 aesni_intel drm_buddy ccp gf128mul vivaldi_fmap drm_display_helper crypto_simd rtsx_pci cryptd xhci_pci nvme_auth video i8042 xhci_pci_renesas crc16 serio wmi 9pnet_virtio 9p 9pnet netfs virtio_net net_failover failover virtio_blk hid_multitouch
Jul 21 02:45:54 arch kernel: i2c_hid_acpi i2c_hid
Jul 21 02:45:54 arch kernel: CR2: ffffafd196e57a88
Jul 21 02:45:54 arch kernel: ---[ end trace 0000000000000000 ]---
Jul 21 02:45:54 arch kernel: RIP: 0010:poll_freewait+0x58/0xa0
Jul 21 02:45:54 arch kernel: Code: 83 c3 40 e8 9a c4 ce ff 48 8b 7b b0 e8 21 33 fe ff 41 39 6d 28 7f de 4d 85 e4 74 40 49 8b 5c 24 08 49 8d 6c 24 10 48 83 eb 40 <48> 8b 7b 38 48 8d 73 10 e8 6b c4 ce ff 48 8b 3b e8 f3 32 fe ff 48
Jul 21 02:45:54 arch kernel: RSP: 0018:ffffafd196e578e8 EFLAGS: 00010202
Jul 21 02:45:54 arch kernel: RAX: dead000000000122 RBX: 000000b4ffffed01 RCX: 0000000000000001
Jul 21 02:45:54 arch kernel: RDX: ffffa09a0bc10590 RSI: 0000000000000292 RDI: ffffa09a00e95700
Jul 21 02:45:54 arch kernel: RBP: ffffa09a09197010 R08: 0000000000000001 R09: 0000000000000001
Jul 21 02:45:54 arch kernel: R10: 0000000000000002 R11: 0000000000000001 R12: ffffa09a09197000
Jul 21 02:45:54 arch kernel: R13: ffffafd196e57a70 R14: 0000000000000011 R15: 0000000000000001
Jul 21 02:45:54 arch kernel: FS: 00007f90b9e006c0(0000) GS:ffffa09cff680000(0000) knlGS:0000000000000000
Jul 21 02:45:54 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 21 02:45:54 arch kernel: CR2: ffffafd196e57a88 CR3: 000000010dfbc000 CR4: 0000000000350ef0
Jul 21 02:45:54 arch kernel: note: gdbus[1656] exited with irqs disabled
Jul 21 02:45:54 arch kernel: note: gdbus[1656] exited with preempt_count 1
Thanks!
>>>
>>> How I tracked down the problem commits:
>>>
>>> I was not able to successfully "git bisect" this bug - I seemed to run
>>> into a mess of unrelated problems/errors that sent me down a rabbit hole
>>> chasing who knows what. I had already manually narrowed down the bug to
>>> amd_sfh by blacklisting modules, so I reverted each
>>> drivers/hid/amd-sfh-hid commit on the stable linux-6.9.y branch (v6.9.8
>>> known "bad"), back to v6.6 (known "good"), and then manually bisected
>>> the revert commits, landing on "HID: amd_sfh: Improve boot time when SFH
>>> is available" (2105e8e00da4) as the first "bad" commit.
>>>
>>> I wanted to be able to test with only the "bad" commit(s) removed; it
>>> turns out 6296562f30b1 ("HID: amd_sfh: Extend MP2 register access to
>>> SFH") needs to be reverted to do that. Everything seems fine with these
>>> two commits reverted (again, this in on the stable linux-6.9.y branch).
>>>
>>> When testing, "bad" commits usually quickly display some variation of
>>> the page dump/BTRFS errors, similar to the dmesg output below. I
>>> consider commits "good" if the system survives "stress-ng --all 2
>>> --vm-bytes 50% --minimize --syslog --status 10 -t 5m" (run as a non-root
>>> user), which was usually followed by building the next test kernel. The
>>> "bad" commits often show errors before I even get to the stress test.
>>>
>>> Examples of error messages from dmesg:
>>>
>>> [ 653.364343] page: refcount:4 mapcount:0 mapping:00000000b159289f
>>> index:0x585a7cec pfn:0x10b5c1
>>> [ 653.364353] memcg:ffff8f2600918000
>>> [ 653.364354] aops:btree_aops ino:1
>>> [ 653.364358] flags:
>>> 0x17ffffd000802a(uptodate|lru|private|writeback|node=0|zone=2|lastcpupid=0x1fffff)
>>> [ 653.364361] page_type: 0xffffffff()
>>> [ 653.364363] raw: 0017ffffd000802a fffff1da87ee3288 fffff1da842d70c8
>>> ffff8f260c719458
>>> [ 653.364365] raw: 00000000585a7cec ffff8f26cd09e0f0 00000004ffffffff
>>> ffff8f2600918000
>>> [ 653.364366] page dumped because: eb page dump
>>> [ 653.364367] BTRFS critical (device dm-0): corrupt leaf: root=7
>>> block=6071604133888 slot=159, unexpected item end, have 2768254010
>>> expect 13379
>>> [ 653.364371] BTRFS info (device dm-0): leaf 6071604133888 gen 679995
>>> total ptrs 353 free space 322 owner 7
>>> [ 653.364373] item 0 key (18446744073709551606 128 1062871883776)
>>> itemoff 16271 itemsize 12
>>> [ 653.364375] item 1 key (18446744073709551606 128 1062871896064)
>>> itemoff 16263 itemsize 8
>>> [ 653.364376] item 2 key (18446744073709551606 128 1062871904256)
>>> itemoff 16255 itemsize 8
>>> ...
>>> [ 653.364762] item 350 key (18446744073709551606 128 1062879260672)
>>> itemoff 9227 itemsize 12
>>> [ 653.364763] item 351 key (18446744073709551606 128 1062879272960)
>>> itemoff 9223 itemsize 4
>>> [ 653.364764] item 352 key (18446744073709551606 128 1062879277056)
>>> itemoff 9147 itemsize 76
>>> [ 653.364766] BTRFS error (device dm-0): block=6071604133888 write time
>>> tree block corruption detected
>>> [ 653.375440] BTRFS: error (device dm-0) in
>>> btrfs_commit_transaction:2511: errno=-5 IO failure (Error while writing
>>> out transaction)
>>> [ 653.375453] BTRFS info (device dm-0 state E): forced readonly
>>> [ 653.375458] BTRFS warning (device dm-0 state E): Skipping commit of
>>> aborted transaction.
>>> [ 653.375461] BTRFS error (device dm-0 state EA): Transaction aborted
>>> (error -5)
>>> [ 653.375465] BTRFS: error (device dm-0 state EA) in
>>> cleanup_transaction:2005: errno=-5 IO failure
>>> [ 653.375582] BTRFS warning (device dm-0 state EA): Skipping commit of
>>> aborted transaction.
>>> [ 653.375586] BTRFS: error (device dm-0 state EA) in
>>> cleanup_transaction:2005: errno=-5 IO failure
>>>
>>> Another example:
>>>
>>> [ 5478.134046] page: refcount:4 mapcount:0 mapping:0000000010080c01
>>> index:0x5459ff30 pfn:0x168c7f
>>> [ 5478.134054] memcg:ffff89c240988000
>>> [ 5478.134056] aops:btree_aops ino:1
>>> [ 5478.134061] flags:
>>> 0x17ffffd800802a(uptodate|lru|private|writeback|node=0|zone=2|lastcpupid=0x1fffff)
>>> [ 5478.134064] page_type: 0xffffffff()
>>> [ 5478.134066] raw: 0017ffffd800802a ffffcc5d043e2bc8 ffffcc5d05a08c88
>>> ffff89c249968338
>>> [ 5478.134068] raw: 000000005459ff30 ffff89c246fa22d0 00000004ffffffff
>>> ffff89c240988000
>>> [ 5478.134069] page dumped because: eb page dump
>>> [ 5478.134071] BTRFS critical (device dm-0): corrupt leaf: root=2161
>>> block=5796594384896 slot=84 ino=2434728, invalid inode generation: has
>>> 72057594122450740 expect (0, 664473]
>>> [ 5478.134075] BTRFS info (device dm-0): leaf 5796594384896 gen 664472
>>> total ptrs 120 free space 1223 owner 2161
>>> [ 5478.134077] item 0 key (2434713 24 3817753667) itemoff 16210
>>> itemsize 73
>>> [ 5478.134078] item 1 key (2434713 108 0) itemoff 15359 itemsize 851
>>> [ 5478.134080] inline extent data size 830
>>> [ 5478.134081] item 2 key (2434714 1 0) itemoff 15199 itemsize 160
>>> [ 5478.134082] inode generation 636724 size 758 mode 100644
>>> [ 5478.134083] item 3 key (2434714 12 2348495) itemoff 15181 itemsize 18
>>> ...
>>> [ 5478.134242] item 117 key (2434733 108 0) itemoff 4398 itemsize 329
>>> [ 5478.134243] inline extent data size 308
>>> [ 5478.134244] item 118 key (2434734 1 0) itemoff 4238 itemsize 160
>>> [ 5478.134245] inode generation 636724 size 30 mode 40755
>>> [ 5478.134245] item 119 key (2434734 12 2434375) itemoff 4223 itemsize 15
>>> [ 5478.134247] BTRFS error (device dm-0): block=5796594384896 write time
>>> tree block corruption detected
>>> [ 5478.263726] BTRFS: error (device dm-0) in
>>> btrfs_commit_transaction:2511: errno=-5 IO failure (Error while writing
>>> out transaction)
>>> [ 5478.263733] BTRFS info (device dm-0 state E): forced readonly
>>> [ 5478.263736] BTRFS warning (device dm-0 state E): Skipping commit of
>>> aborted transaction.
>>> [ 5478.263737] BTRFS error (device dm-0 state EA): Transaction aborted
>>> (error -5)
>>> [ 5478.263739] BTRFS: error (device dm-0 state EA) in
>>> cleanup_transaction:2005: errno=-5 IO failure
>>> [ 5478.264582] BTRFS warning (device dm-0 state EA): Skipping commit of
>>> aborted transaction.
>>> [ 5478.264595] BTRFS: error (device dm-0 state EA) in
>>> cleanup_transaction:2005: errno=-5 IO failure
>> #regzbot ^introduced: 6296562f30b1
>> #regzbot summary: hid: amd_sfh: memory/page corruption correlated with
>> 6296562f30b1 or 2105e8e00da4
>> #regzbot ignore-activity
>
>
Powered by blists - more mailing lists