lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ12PfOFNZPnxPecqHHEPLhcLa8iN_rBHF11-XA7OYZCxtPZ=A@mail.gmail.com>
Date: Tue, 27 Jan 2026 13:30:24 +0300
From: TINSAE TADESSE <tinsaetadesse2015@...il.com>
To: Armin Wolf <W_Armin@....de>
Cc: Guenter Roeck <linux@...ck-us.net>, 
	"linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] hwmon: spd5118: Do not fail resume on temporary I2C errors

On Sat, Jan 24, 2026 at 10:11 PM Armin Wolf <W_Armin@....de> wrote:
>
> Am 24.01.26 um 15:45 schrieb TINSAE TADESSE:
>
> > On Fri, Jan 16, 2026 at 9:24 AM Guenter Roeck <linux@...ck-us.net> wrote:
> >> On 1/15/26 05:50, TINSAE TADESSE wrote:
> >>> On Wed, Jan 14, 2026 at 5:23 PM Guenter Roeck <linux@...ck-us.net> wrote:
> >>>> On 1/14/26 05:07, TINSAE TADESSE wrote:
> >>>> ...
> >>>>>>> Hi Guenter,
> >>>>>>>
> >>>>>>> I tested changing the i801 SMBus controller to use
> >>>>>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() instead of
> >>>>>>> DEFINE_SIMPLE_DEV_PM_OPS() as a diagnostic experiment. With this
> >>>>>>> change, spd5118 resume failures (-ENXIO)
> >>>>>>> still persist, suggesting PM ordering alone is insufficient and other
> >>>>>>> firmware interactions are involved.
> >>>>>> How about the problem in the suspend function ? Is that also still seen ?
> >>>>>>
> >>>>>> Also, the subject talks about -EIO. Is that still seen ?
> >>>>>>
> >>>>>> Either case, can you enable debug logs for the i801 driver ?
> >>>>>> It should generate log entries when it reports errors.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Guenter
> >>>>>>
> >>>>> Hi Guenter,
> >>>>>
> >>>>> Thank you for the questions. To clarify:
> >>>>>
> >>>> Please do not drop mailing lists from replies.
> >>>>
> >>>>> 1) I have not observed any failures in the suspend path. The suspend
> >>>>> callback completes successfully, and
> >>>>> I have not seen I2C errors or warnings during suspend at any point.
> >>>> Sorry, I seem to be missing something.
> >>>>
> >>>> In that case, what is the point of patch 3/3 of your series which
> >>>> removes hardware accesses from the suspend function ?
> >>>>
> >>>>> 2) I have also not observed -EIO in my testing. The error consistently
> >>>>> reported on resume and subsequent hwmon access is -ENXIO.
> >>>>> Earlier references to -EIO were based on assumptions rather than
> >>>>> observed logs, and I should have been clearer about that.
> >>>>>
> >>>> Thanks for the clarification.
> >>>>
> >>>> Guenter
> >>>>
> >>>>> I am enabling debug logging for the i801 driver to collect more
> >>>>> concrete evidence of controller state during resume.
> >>> Hi Guenter,
> >>>
> >>>> Sorry, I seem to be missing something.
> >>>>
> >>>> In that case, what is the point of patch 3/3 of your series which
> >>>> removes hardware accesses from the suspend function ?
> >>> You are right to question this, and I agree that it needs clarification.
> >>>
> >>> Patch 3/3 was originally proposed under the assumption that the resume failures
> >>> were caused by spd5118 performing I2C transactions while the
> >>> controller was not yet available,
> >>> and that removing hardware accesses from the suspend path might
> >>> mitigate the issue.
> >>> At that point, I assumed the problem was limited to the resume callback.
> >>>
> >>> After enabling detailed i801 debug logging and testing with
> >>> SET_LATE_SYSTEM_SLEEP_PM_OPS() in the i801 driver,
> >>> it became clear that this assumption was incorrect. The controller
> >>> itself reports "i801_smbus: No response"
> >>> both during suspend and immediately after resume, and spd5118 merely
> >>> propagates the resulting -ENXIO.
> >> Outch, that really hurts, because it means that something is seriously
> >> broken in both the suspend and resume path. The device _must_ be accessible
> >> in the suspend path. Otherwise there is no guarantee that the device is
> >> accessible for normal (pre-suspend) operation. After all, someone could
> >> run a script reading sysfs attributes in a tight loop continuously,
> >> or the thermal subsystem could try to access the chip. That would suddenly
> >> start to fail if something in the device access path starts to be suspended
> >> while the underlying hardware is still believed to be operational.
> >>
> >> I could imagine some hack/quirk for the resume path, such as delaying resume
> >> for some period of time for affected hardware, but I have no idea what to
> >> do on the suspend side. We can not just drop device writes during suspend
> >> because some broken hardware/firmware does not let us actually access
> >> (and thus suspend) the hardware anymore by the time the suspend function
> >> is called.
> >>
> >> Guenter
> >>
> >>> This indicates that the issue is not caused by spd5118 suspend/resume
> >>> behavior, but by the unavailability of the
> >>> SMBus controller due to platform or firmware interactions during
> >>> s2idle transitions.
> >>>
> >>> Given this, I agree that patch 3/3 does not address the root cause and
> >>> does not provide a justified improvement.
> >>> I am therefore fine with dropping it.
> >>>
> >>> Thank you for pointing this out.
> >>>
> > Hi Guenter,
> >
> > Thanks for the continued review and for questioning the earlier
> > direction — that helped narrow this down properly.
> >
> > After enabling full i801 debug logging (included below in this email)
> > and inspecting both drivers, it became clear that the resume
> > failures are not caused by spd5118 accessing the hardware too
> > early, nor by PM ordering issues. Instead, the SMBus controller
> > explicitly reports “SPD Write Disable is set”, and any
> > block write transactions to the SPD device consistently fail with
> > DEV_ERR. spd5118 merely propagates the resulting -ENXIO.
>
> Oh no, this likely happens even when merely reading values, as the spd5118
> uses a page register to switch between different register pages. In order
> to access temperature data (page 0), you might already have to issue a
> write access to the page register. The only reason why it works for you
> is that the spd5118 likely already has page 0 selected by the system firmware
> during boot.
>
> > With that in mind, I have dropped the earlier patch that attempted
> > to remove hardware access from the suspend path
> > unconditionally.
> > That patch does not address the root cause and is no longer
> > part of the series.
> >
> > I am instead proposing a minimal 2-patch series:
> >
> > 1/2 records whether the platform enforces SPD write disable at probe
> > time (no behavior change).
> > 2/2 avoids regcache writeback during suspend/resume when the device
> > operates in read-only mode, while still allowing read access to
> > temperature inputs.
> >
> > This avoids issuing SMBus transactions that are architecturally
> > blocked on these systems, and does not rely on
> > delays or PM ordering assumptions, and leaves behavior unchanged on
> > platforms where SPD writes are permitted.
> >
> > If this direction looks acceptable, I’m happy to re-spin and post the
> > series formally.
> >
> > Thanks again for the guidance.
>
> I do not know if this is a reliable solution, as the system firmware might
> select a different register page during resume. This will then prevent the
> driver from functioning.
>
> I would love to see the spd5118 driver working on such systems with reduced
> functionality, but i will leave it to Guenter to decide if this approach is
> maintainable.
>
> Besides that: did the spd5118 driver load automatically on your device?
>
> Thanks,
> Armin Wolf
>
> >
> >
> > [   13.530830] i2c-core: driver [spd5118] registered
> > ...
> > [   29.555298] i801_smbus 0000:00:1f.4: SPD Write Disable is set
> > ...
> > [   29.582447] i2c i2c-14: Creating spd5118 at 0x52
> > [   29.590744] spd5118 14-0052: probe
> > ...
> > [   29.618983] spd5118 14-0052: DDR5 temperature sensor: vendor
> > 0x00:0xb3 revision 2.2
> > [   29.619662] i2c i2c-14: client [spd5118] registered with bus id 14-0052
> > ...
> > [ 1057.504362] PM: suspend entry (s2idle)
> > [ 1057.944405] spd5118 14-0052: Entering suspend path...
> > [ 1057.945387] i801_smbus 0000:00:1f.4: i801 access: command=1a,
> > size=8, addr=0x52, read_write=1
> > [ 1057.946251] i801_smbus 0000:00:1f.4: i801 access: command=b,
> > size=8, addr=0x52, read_write=1
> > [ 1057.946866] i801_smbus 0000:00:1f.4: i801 access: command=1a,
> > size=8, addr=0x52, read_write=1
> > [ 1057.948324] i801_smbus 0000:00:1f.4: i801 access: command=b,
> > size=8, addr=0x52, read_write=1
> > [ 1057.949817] i801_smbus 0000:00:1f.4: i801 access: command=1a,
> > size=8, addr=0x52, read_write=0
> > [ 1057.949904] i801_smbus 0000:00:1f.4: i801 timeout: status=0x4,
> > SMBHSTCNT=0x15, SMBHSTSTS_DEV_ERR=4
> > [ 1057.949952] i801_smbus 0000:00:1f.4: No response
> > [ 1057.950215] i801_smbus 0000:00:1f.4: Entering suspend path...
> > [ 1059.338647] ACPI: EC: interrupt blocked
> > [ 1060.756385] ACPI: EC: interrupt unblocked
> > [ 1060.926423] i801_smbus 0000:00:1f.4: Entering resume path...
> > [ 1060.926623] spd5118 14-0052: Entering resume path...
> > [ 1060.927930] i801_smbus 0000:00:1f.4: i801 access: command=b,
> > size=8, addr=0x52, read_write=0
> > [ 1060.927969] i801_smbus 0000:00:1f.4: i801 timeout: status=0x4,
> > SMBHSTCNT=0x15, SMBHSTSTS_DEV_ERR=4
> > [ 1060.927995] i801_smbus 0000:00:1f.4: No response
> > [ 1060.928477] i801_smbus 0000:00:1f.4: i801 access: command=b,
> > size=8, addr=0x52, read_write=0
> > [ 1060.928517] i801_smbus 0000:00:1f.4: i801 timeout: status=0x4,
> > SMBHSTCNT=0x15, SMBHSTSTS_DEV_ERR=4
> > [ 1060.928543] i801_smbus 0000:00:1f.4: No response
> > [ 1060.928582] spd5118 14-0052: Failed to write b = 0: -6
> > [ 1060.928707] spd5118 14-0052: PM: dpm_run_callback(): spd5118_resume
> > returns -6
> > [ 1060.928981] spd5118 14-0052: PM: failed to resume async: error -6
> > [ 1062.414560] PM: suspend exit
> >


Hi Armin,

> >> Besides that: did the spd5118 driver load automatically on your device?
Yes.


> >> Oh no, this likely happens even when merely reading values, as the
> >> spd5118
> >> uses a page register to switch between different register pages. In
> >> order
> >> to access temperature data (page 0), you might already have to issue a
> >> write access to the page register. The only reason why it works for you
> >> is that the spd5118 likely already has page 0 selected by the system
> >> firmware
> >> during boot.

I tested the assumption by explicitly attempting to switch the SPD5118
page in i2c_probe()
using the same mechanism as the existing recovery logic. On my
platform, the write to
SPD5118_REG_I2C_LEGACY_MODE fails with SMBHSTSTS_DEV_ERR, and the page
remains unchanged.
This demonstrates that page selection requires a write and cannot be
performed on platforms
with SPD write disable. Temperature reads therefore only work because
firmware has
pre-selected page 0, and Linux has no way to recover if that state is
lost (e.g. across suspend).
This confirms that a “read-only SPD” mode cannot be made robust.

/* Simulate later temp read attempts depend on startup page writes */
dev_info(dev, "(i2c-probe) Initial legacy_mode = 0x%02x\n", mode);
i2c_smbus_write_byte_data(client, SPD5118_REG_I2C_LEGACY_MODE, mode |
SPD5118_LEGACY_PAGE_MASK);
mode = i2c_smbus_read_byte_data(client, SPD5118_REG_I2C_LEGACY_MODE);
dev_info(dev, "(i2c-probe) Updated legacy_mode = 0x%02x\n", mode);

/* Log output */
i801_smbus 0000:00:1f.4: SPD Write Disable is set
i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
...
spd5118 14-0052: (i2c-probe) Initial legacy_mode = 0x00
i801_smbus 0000:00:1f.4: i801 access: command=b, size=2, addr=0x52, read_write=0
i801_smbus 0000:00:1f.4: i801 timeout: status=0x4, SMBHSTCNT=0x9,
SMBHSTSTS_DEV_ERR=4
i801_smbus 0000:00:1f.4: i801 access: command=b, size=2, addr=0x52, read_write=1
spd5118 14-0052: (i2c-probe) Updated legacy_mode = 0x00

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ