[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5ecc96da-0c87-45f6-ab57-c3ea8eb28de1@gmx.de>
Date: Sat, 24 Jan 2026 20:11:04 +0100
From: Armin Wolf <W_Armin@....de>
To: TINSAE TADESSE <tinsaetadesse2015@...il.com>,
Guenter Roeck <linux@...ck-us.net>
Cc: "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] hwmon: spd5118: Do not fail resume on temporary I2C
errors
Am 24.01.26 um 15:45 schrieb TINSAE TADESSE:
> On Fri, Jan 16, 2026 at 9:24 AM Guenter Roeck <linux@...ck-us.net> wrote:
>> On 1/15/26 05:50, TINSAE TADESSE wrote:
>>> On Wed, Jan 14, 2026 at 5:23 PM Guenter Roeck <linux@...ck-us.net> wrote:
>>>> On 1/14/26 05:07, TINSAE TADESSE wrote:
>>>> ...
>>>>>>> Hi Guenter,
>>>>>>>
>>>>>>> I tested changing the i801 SMBus controller to use
>>>>>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() instead of
>>>>>>> DEFINE_SIMPLE_DEV_PM_OPS() as a diagnostic experiment. With this
>>>>>>> change, spd5118 resume failures (-ENXIO)
>>>>>>> still persist, suggesting PM ordering alone is insufficient and other
>>>>>>> firmware interactions are involved.
>>>>>> How about the problem in the suspend function ? Is that also still seen ?
>>>>>>
>>>>>> Also, the subject talks about -EIO. Is that still seen ?
>>>>>>
>>>>>> Either case, can you enable debug logs for the i801 driver ?
>>>>>> It should generate log entries when it reports errors.
>>>>>>
>>>>>> Thanks,
>>>>>> Guenter
>>>>>>
>>>>> Hi Guenter,
>>>>>
>>>>> Thank you for the questions. To clarify:
>>>>>
>>>> Please do not drop mailing lists from replies.
>>>>
>>>>> 1) I have not observed any failures in the suspend path. The suspend
>>>>> callback completes successfully, and
>>>>> I have not seen I2C errors or warnings during suspend at any point.
>>>> Sorry, I seem to be missing something.
>>>>
>>>> In that case, what is the point of patch 3/3 of your series which
>>>> removes hardware accesses from the suspend function ?
>>>>
>>>>> 2) I have also not observed -EIO in my testing. The error consistently
>>>>> reported on resume and subsequent hwmon access is -ENXIO.
>>>>> Earlier references to -EIO were based on assumptions rather than
>>>>> observed logs, and I should have been clearer about that.
>>>>>
>>>> Thanks for the clarification.
>>>>
>>>> Guenter
>>>>
>>>>> I am enabling debug logging for the i801 driver to collect more
>>>>> concrete evidence of controller state during resume.
>>> Hi Guenter,
>>>
>>>> Sorry, I seem to be missing something.
>>>>
>>>> In that case, what is the point of patch 3/3 of your series which
>>>> removes hardware accesses from the suspend function ?
>>> You are right to question this, and I agree that it needs clarification.
>>>
>>> Patch 3/3 was originally proposed under the assumption that the resume failures
>>> were caused by spd5118 performing I2C transactions while the
>>> controller was not yet available,
>>> and that removing hardware accesses from the suspend path might
>>> mitigate the issue.
>>> At that point, I assumed the problem was limited to the resume callback.
>>>
>>> After enabling detailed i801 debug logging and testing with
>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() in the i801 driver,
>>> it became clear that this assumption was incorrect. The controller
>>> itself reports "i801_smbus: No response"
>>> both during suspend and immediately after resume, and spd5118 merely
>>> propagates the resulting -ENXIO.
>> Outch, that really hurts, because it means that something is seriously
>> broken in both the suspend and resume path. The device _must_ be accessible
>> in the suspend path. Otherwise there is no guarantee that the device is
>> accessible for normal (pre-suspend) operation. After all, someone could
>> run a script reading sysfs attributes in a tight loop continuously,
>> or the thermal subsystem could try to access the chip. That would suddenly
>> start to fail if something in the device access path starts to be suspended
>> while the underlying hardware is still believed to be operational.
>>
>> I could imagine some hack/quirk for the resume path, such as delaying resume
>> for some period of time for affected hardware, but I have no idea what to
>> do on the suspend side. We can not just drop device writes during suspend
>> because some broken hardware/firmware does not let us actually access
>> (and thus suspend) the hardware anymore by the time the suspend function
>> is called.
>>
>> Guenter
>>
>>> This indicates that the issue is not caused by spd5118 suspend/resume
>>> behavior, but by the unavailability of the
>>> SMBus controller due to platform or firmware interactions during
>>> s2idle transitions.
>>>
>>> Given this, I agree that patch 3/3 does not address the root cause and
>>> does not provide a justified improvement.
>>> I am therefore fine with dropping it.
>>>
>>> Thank you for pointing this out.
>>>
> Hi Guenter,
>
> Thanks for the continued review and for questioning the earlier
> direction — that helped narrow this down properly.
>
> After enabling full i801 debug logging (included below in this email)
> and inspecting both drivers, it became clear that the resume
> failures are not caused by spd5118 accessing the hardware too
> early, nor by PM ordering issues. Instead, the SMBus controller
> explicitly reports “SPD Write Disable is set”, and any
> block write transactions to the SPD device consistently fail with
> DEV_ERR. spd5118 merely propagates the resulting -ENXIO.
Oh no, this likely happens even when merely reading values, as the spd5118
uses a page register to switch between different register pages. In order
to access temperature data (page 0), you might already have to issue a
write access to the page register. The only reason why it works for you
is that the spd5118 likely already has page 0 selected by the system firmware
during boot.
> With that in mind, I have dropped the earlier patch that attempted
> to remove hardware access from the suspend path
> unconditionally.
> That patch does not address the root cause and is no longer
> part of the series.
>
> I am instead proposing a minimal 2-patch series:
>
> 1/2 records whether the platform enforces SPD write disable at probe
> time (no behavior change).
> 2/2 avoids regcache writeback during suspend/resume when the device
> operates in read-only mode, while still allowing read access to
> temperature inputs.
>
> This avoids issuing SMBus transactions that are architecturally
> blocked on these systems, and does not rely on
> delays or PM ordering assumptions, and leaves behavior unchanged on
> platforms where SPD writes are permitted.
>
> If this direction looks acceptable, I’m happy to re-spin and post the
> series formally.
>
> Thanks again for the guidance.
I do not know if this is a reliable solution, as the system firmware might
select a different register page during resume. This will then prevent the
driver from functioning.
I would love to see the spd5118 driver working on such systems with reduced
functionality, but i will leave it to Guenter to decide if this approach is
maintainable.
Besides that: did the spd5118 driver load automatically on your device?
Thanks,
Armin Wolf
>
>
> [ 13.530830] i2c-core: driver [spd5118] registered
> ...
> [ 29.555298] i801_smbus 0000:00:1f.4: SPD Write Disable is set
> ...
> [ 29.582447] i2c i2c-14: Creating spd5118 at 0x52
> [ 29.590744] spd5118 14-0052: probe
> ...
> [ 29.618983] spd5118 14-0052: DDR5 temperature sensor: vendor
> 0x00:0xb3 revision 2.2
> [ 29.619662] i2c i2c-14: client [spd5118] registered with bus id 14-0052
> ...
> [ 1057.504362] PM: suspend entry (s2idle)
> [ 1057.944405] spd5118 14-0052: Entering suspend path...
> [ 1057.945387] i801_smbus 0000:00:1f.4: i801 access: command=1a,
> size=8, addr=0x52, read_write=1
> [ 1057.946251] i801_smbus 0000:00:1f.4: i801 access: command=b,
> size=8, addr=0x52, read_write=1
> [ 1057.946866] i801_smbus 0000:00:1f.4: i801 access: command=1a,
> size=8, addr=0x52, read_write=1
> [ 1057.948324] i801_smbus 0000:00:1f.4: i801 access: command=b,
> size=8, addr=0x52, read_write=1
> [ 1057.949817] i801_smbus 0000:00:1f.4: i801 access: command=1a,
> size=8, addr=0x52, read_write=0
> [ 1057.949904] i801_smbus 0000:00:1f.4: i801 timeout: status=0x4,
> SMBHSTCNT=0x15, SMBHSTSTS_DEV_ERR=4
> [ 1057.949952] i801_smbus 0000:00:1f.4: No response
> [ 1057.950215] i801_smbus 0000:00:1f.4: Entering suspend path...
> [ 1059.338647] ACPI: EC: interrupt blocked
> [ 1060.756385] ACPI: EC: interrupt unblocked
> [ 1060.926423] i801_smbus 0000:00:1f.4: Entering resume path...
> [ 1060.926623] spd5118 14-0052: Entering resume path...
> [ 1060.927930] i801_smbus 0000:00:1f.4: i801 access: command=b,
> size=8, addr=0x52, read_write=0
> [ 1060.927969] i801_smbus 0000:00:1f.4: i801 timeout: status=0x4,
> SMBHSTCNT=0x15, SMBHSTSTS_DEV_ERR=4
> [ 1060.927995] i801_smbus 0000:00:1f.4: No response
> [ 1060.928477] i801_smbus 0000:00:1f.4: i801 access: command=b,
> size=8, addr=0x52, read_write=0
> [ 1060.928517] i801_smbus 0000:00:1f.4: i801 timeout: status=0x4,
> SMBHSTCNT=0x15, SMBHSTSTS_DEV_ERR=4
> [ 1060.928543] i801_smbus 0000:00:1f.4: No response
> [ 1060.928582] spd5118 14-0052: Failed to write b = 0: -6
> [ 1060.928707] spd5118 14-0052: PM: dpm_run_callback(): spd5118_resume
> returns -6
> [ 1060.928981] spd5118 14-0052: PM: failed to resume async: error -6
> [ 1062.414560] PM: suspend exit
>
Powered by blists - more mailing lists