[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <kkenglf5lkvu2um3o44umdxaoeh5zgsaus73kauc6vwcqpnsfv@5i6xtjnxtb5c>
Date: Mon, 22 Jul 2024 18:30:44 +0200
From: Benjamin Tissoires <bentiss@...nel.org>
To: Chris Hixon <linux-kernel-bugs@...ontech.com>,
Basavaraj Natikar <bnatikar@....com>
Cc: Linux regressions mailing list <regressions@...ts.linux.dev>,
Jiri Kosina <jkosina@...e.com>, LKML <linux-kernel@...r.kernel.org>, Li Ma <li.ma@....com>,
amd-gfx@...ts.freedesktop.org, Alexander.Deucher@....com, yifan1.zhang@....com,
linux-input@...r.kernel.org, Basavaraj Natikar <basavaraj.natikar@....com>
Subject: Re: [BUG] HID: amd_sfh (drivers/hid/amd-sfh-hid/): memory/page
corruption
On Jul 21 2024, Chris Hixon wrote:
> On 7/21/24 00:20, Basavaraj Natikar wrote:
>
> > On 7/17/2024 4:51 PM, Linux regression tracking (Thorsten Leemhuis) wrote:
> >> On 15.07.24 06:39, Chris Hixon wrote:
> >>> System: HP ENVY x360 Convertible 15-ds1xxx; AMD Ryzen 7 4700U with
> >>> Radeon Graphics
> >>>
> >>> Problem commits (introduced in v6.9-rc1):
> >>> 6296562f30b1 HID: amd_sfh: Extend MP2 register access to SFH
> >>> 2105e8e00da4 HID: amd_sfh: Improve boot time when SFH is available
> >>>> It appears amd_sfh commits 6296562f30b1 and 2105e8e00da4 correlate with
> >>> some form of memory/page corruption.
> >> Hi! From a quick search on lore it looks like Basavaraj Natikar who
> >> authored those two commits is inactive since a few days. This is totally
> >> fine, but given the nature of the problem slightly unfortunate. That's
> >> why I'm trying to raise awareness to this report by adding the
> >> subsystems maintainers, a few lists, and a few people to the list of
> >> recipients that were involved in the submission of those two patches.
> >> With a bit of luck somebody might be able to help out. Ciao, Thorsten
> >>
> >>> On my system, this typically
> >>> presents itself as a page dump followed by BTRFS errors, usually
> >>> involving "corrupt leaf" (see dmesg output below); often the BTRFS
> >>> filesystem becomes read-only afterwards. Note that the underlying NVME
> >>> disk seems fine, and the BTRFS filesystem does not actually appear to be
> >>> corrupt when booted/checked from kernels without this bug (no BTRFS
> >>> errors or I/O errors reported on non-problem kernels).
> >>>
> >>> I have no problems when I blacklist the amd_sfh module (any kernel
> >>> version), or revert both commits 6296562f30b1 and 2105e8e00da4 (on
> >>> stable, linux-6.9.y). I have no problems on any recent linux-mainline
> >>> (v6.10{,-rc*}) when reverting these two commits (in addition to
> >>> reverting 7902ec988a9a and 6856f079cd45 to successfully build the
> >>> kernel). I have had no problems with any 6.6.y, v6.7.y, or v6.8.y version.
> >>>
> >>> It is curious BTRFS always seems involved, but problems go away with
> >>> these amd_sfh commits reverted (or amd_afh disabled).
> >>>
> >>> Further notes:
> >>>
> >>> I have not specifically used the amd_sfh module for anything. As far
> >>> I've been able to determine, my system has the "Sensor Fusion Hub" mp2
> >>> chip, but has no supported sensors/sub-devices (or I need to do
> >>> something to enable them), (or there is an error while detecting
> >>> sensors?). All logs I've checked contain something like:
> >>>
> >>> Jul 09 04:14:37 arch kernel: pcie_mp2_amd 0000:04:00.7: enabling device
> >>> (0000 -> 0002)
> >>> Jul 09 04:15:07 arch kernel: pcie_mp2_amd 0000:04:00.7: Failed to
> >>> discover, sensors not enabled is 0
> >>> Jul 09 04:15:07 arch kernel: pcie_mp2_amd 0000:04:00.7:
> >>> amd_sfh_hid_client_init failed err -95
> >>>
> >>> Excerpt from lshw:
> >>> *-generic:1 UNCLAIMED
> >>> description: Signal processing controller
> >>> product: Sensor Fusion Hub
> >>> vendor: Advanced Micro Devices, Inc. [AMD]
> >>> physical id: 0.7
> >>> bus info: pci@...0:04:00.7
> >>> version: 00
> >>> width: 32 bits
> >>> clock: 33MHz
> >>> capabilities: pm pciexpress msi msix cap_list
> >>> configuration: latency=0
> >>> resources: memory:fe000000-fe0fffff
> >>> memory:fe4cc000-fe4cdfff
> >
> > Could you please check with the latest version, including the patch below?
> >
> > https://lore.kernel.org/all/20240718111616.3012155-1-Basavaraj.Natikar@amd.com/
> >
> > Thanks,
> > --
> > Basavaraj
> >
>
> Hi,
>
> Unfortunately, that patch doesn't fix the issue. I do get different crashes...
Thanks for the quick testing.
Basavaraj, was the patch from above an attempt at fixing that particular
regression or was it emerging from some other means?
In other words: should I apply the fix even if it's still not the
correct one for Chris?
Cheers,
Benjamin
Powered by blists - more mailing lists