lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ12PfMN5SOcYc6vBJEz57YVcxXAOker4WA61Xh1znP=i8aBRg@mail.gmail.com>
Date: Sat, 24 Jan 2026 17:45:50 +0300
From: TINSAE TADESSE <tinsaetadesse2015@...il.com>
To: Guenter Roeck <linux@...ck-us.net>
Cc: "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] hwmon: spd5118: Do not fail resume on temporary I2C errors

On Fri, Jan 16, 2026 at 9:24 AM Guenter Roeck <linux@...ck-us.net> wrote:
>
> On 1/15/26 05:50, TINSAE TADESSE wrote:
> > On Wed, Jan 14, 2026 at 5:23 PM Guenter Roeck <linux@...ck-us.net> wrote:
> >>
> >> On 1/14/26 05:07, TINSAE TADESSE wrote:
> >> ...
> >>>>> Hi Guenter,
> >>>>>
> >>>>> I tested changing the i801 SMBus controller to use
> >>>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() instead of
> >>>>> DEFINE_SIMPLE_DEV_PM_OPS() as a diagnostic experiment. With this
> >>>>> change, spd5118 resume failures (-ENXIO)
> >>>>> still persist, suggesting PM ordering alone is insufficient and other
> >>>>> firmware interactions are involved.
> >>>>
> >>>> How about the problem in the suspend function ? Is that also still seen ?
> >>>>
> >>>> Also, the subject talks about -EIO. Is that still seen ?
> >>>>
> >>>> Either case, can you enable debug logs for the i801 driver ?
> >>>> It should generate log entries when it reports errors.
> >>>>
> >>>> Thanks,
> >>>> Guenter
> >>>>
> >>>
> >>> Hi Guenter,
> >>>
> >>> Thank you for the questions. To clarify:
> >>>
> >> Please do not drop mailing lists from replies.
> >>
> >>> 1) I have not observed any failures in the suspend path. The suspend
> >>> callback completes successfully, and
> >>> I have not seen I2C errors or warnings during suspend at any point.
> >>
> >> Sorry, I seem to be missing something.
> >>
> >> In that case, what is the point of patch 3/3 of your series which
> >> removes hardware accesses from the suspend function ?
> >>
> >>> 2) I have also not observed -EIO in my testing. The error consistently
> >>> reported on resume and subsequent hwmon access is -ENXIO.
> >>> Earlier references to -EIO were based on assumptions rather than
> >>> observed logs, and I should have been clearer about that.
> >>>
> >>
> >> Thanks for the clarification.
> >>
> >> Guenter
> >>
> >>> I am enabling debug logging for the i801 driver to collect more
> >>> concrete evidence of controller state during resume.
> >>
> >
> > Hi Guenter,
> >
> >> Sorry, I seem to be missing something.
> >>
> >> In that case, what is the point of patch 3/3 of your series which
> >> removes hardware accesses from the suspend function ?
> >
> > You are right to question this, and I agree that it needs clarification.
> >
> > Patch 3/3 was originally proposed under the assumption that the resume failures
> > were caused by spd5118 performing I2C transactions while the
> > controller was not yet available,
> > and that removing hardware accesses from the suspend path might
> > mitigate the issue.
> > At that point, I assumed the problem was limited to the resume callback.
> >
> > After enabling detailed i801 debug logging and testing with
> > SET_LATE_SYSTEM_SLEEP_PM_OPS() in the i801 driver,
> > it became clear that this assumption was incorrect. The controller
> > itself reports "i801_smbus: No response"
> > both during suspend and immediately after resume, and spd5118 merely
> > propagates the resulting -ENXIO.
>
> Outch, that really hurts, because it means that something is seriously
> broken in both the suspend and resume path. The device _must_ be accessible
> in the suspend path. Otherwise there is no guarantee that the device is
> accessible for normal (pre-suspend) operation. After all, someone could
> run a script reading sysfs attributes in a tight loop continuously,
> or the thermal subsystem could try to access the chip. That would suddenly
> start to fail if something in the device access path starts to be suspended
> while the underlying hardware is still believed to be operational.
>
> I could imagine some hack/quirk for the resume path, such as delaying resume
> for some period of time for affected hardware, but I have no idea what to
> do on the suspend side. We can not just drop device writes during suspend
> because some broken hardware/firmware does not let us actually access
> (and thus suspend) the hardware anymore by the time the suspend function
> is called.
>
> Guenter
>
> > This indicates that the issue is not caused by spd5118 suspend/resume
> > behavior, but by the unavailability of the
> > SMBus controller due to platform or firmware interactions during
> > s2idle transitions.
> >
> > Given this, I agree that patch 3/3 does not address the root cause and
> > does not provide a justified improvement.
> > I am therefore fine with dropping it.
> >
> > Thank you for pointing this out.
> >
>

Hi Guenter,

Thanks for the continued review and for questioning the earlier
direction — that helped narrow this down properly.

After enabling full i801 debug logging (included below in this email)
and inspecting both drivers, it became clear that the resume
failures are not caused by spd5118 accessing the hardware too
early, nor by PM ordering issues. Instead, the SMBus controller
explicitly reports “SPD Write Disable is set”, and any
block write transactions to the SPD device consistently fail with
DEV_ERR. spd5118 merely propagates the resulting -ENXIO.

With that in mind, I have dropped the earlier patch that attempted
to remove hardware access from the suspend path
unconditionally.
That patch does not address the root cause and is no longer
part of the series.

I am instead proposing a minimal 2-patch series:

1/2 records whether the platform enforces SPD write disable at probe
time (no behavior change).
2/2 avoids regcache writeback during suspend/resume when the device
operates in read-only mode, while still allowing read access to
temperature inputs.

This avoids issuing SMBus transactions that are architecturally
blocked on these systems, and does not rely on
delays or PM ordering assumptions, and leaves behavior unchanged on
platforms where SPD writes are permitted.

If this direction looks acceptable, I’m happy to re-spin and post the
series formally.

Thanks again for the guidance.


[   13.530830] i2c-core: driver [spd5118] registered
...
[   29.555298] i801_smbus 0000:00:1f.4: SPD Write Disable is set
...
[   29.582447] i2c i2c-14: Creating spd5118 at 0x52
[   29.590744] spd5118 14-0052: probe
...
[   29.618983] spd5118 14-0052: DDR5 temperature sensor: vendor
0x00:0xb3 revision 2.2
[   29.619662] i2c i2c-14: client [spd5118] registered with bus id 14-0052
...
[ 1057.504362] PM: suspend entry (s2idle)
[ 1057.944405] spd5118 14-0052: Entering suspend path...
[ 1057.945387] i801_smbus 0000:00:1f.4: i801 access: command=1a,
size=8, addr=0x52, read_write=1
[ 1057.946251] i801_smbus 0000:00:1f.4: i801 access: command=b,
size=8, addr=0x52, read_write=1
[ 1057.946866] i801_smbus 0000:00:1f.4: i801 access: command=1a,
size=8, addr=0x52, read_write=1
[ 1057.948324] i801_smbus 0000:00:1f.4: i801 access: command=b,
size=8, addr=0x52, read_write=1
[ 1057.949817] i801_smbus 0000:00:1f.4: i801 access: command=1a,
size=8, addr=0x52, read_write=0
[ 1057.949904] i801_smbus 0000:00:1f.4: i801 timeout: status=0x4,
SMBHSTCNT=0x15, SMBHSTSTS_DEV_ERR=4
[ 1057.949952] i801_smbus 0000:00:1f.4: No response
[ 1057.950215] i801_smbus 0000:00:1f.4: Entering suspend path...
[ 1059.338647] ACPI: EC: interrupt blocked
[ 1060.756385] ACPI: EC: interrupt unblocked
[ 1060.926423] i801_smbus 0000:00:1f.4: Entering resume path...
[ 1060.926623] spd5118 14-0052: Entering resume path...
[ 1060.927930] i801_smbus 0000:00:1f.4: i801 access: command=b,
size=8, addr=0x52, read_write=0
[ 1060.927969] i801_smbus 0000:00:1f.4: i801 timeout: status=0x4,
SMBHSTCNT=0x15, SMBHSTSTS_DEV_ERR=4
[ 1060.927995] i801_smbus 0000:00:1f.4: No response
[ 1060.928477] i801_smbus 0000:00:1f.4: i801 access: command=b,
size=8, addr=0x52, read_write=0
[ 1060.928517] i801_smbus 0000:00:1f.4: i801 timeout: status=0x4,
SMBHSTCNT=0x15, SMBHSTSTS_DEV_ERR=4
[ 1060.928543] i801_smbus 0000:00:1f.4: No response
[ 1060.928582] spd5118 14-0052: Failed to write b = 0: -6
[ 1060.928707] spd5118 14-0052: PM: dpm_run_callback(): spd5118_resume
returns -6
[ 1060.928981] spd5118 14-0052: PM: failed to resume async: error -6
[ 1062.414560] PM: suspend exit

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ