[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ12PfODdy+rgBumHv5gUeaqikUGMkADg-UoBLuZPrtBanF40w@mail.gmail.com>
Date: Tue, 27 Jan 2026 22:23:24 +0300
From: TINSAE TADESSE <tinsaetadesse2015@...il.com>
To: Guenter Roeck <linux@...ck-us.net>
Cc: Armin Wolf <W_Armin@....de>,
"linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] hwmon: spd5118: Do not fail resume on temporary I2C errors
On Tue, Jan 27, 2026 at 7:39 PM Guenter Roeck <linux@...ck-us.net> wrote:
>
> On 1/27/26 02:35, TINSAE TADESSE wrote:
> >
> >
> > On Mon, Jan 26, 2026 at 1:36 AM Guenter Roeck <linux@...ck-us.net <mailto:linux@...ck-us.net>> wrote:
> > >
> > > On 1/24/26 11:11, Armin Wolf wrote:
> > > > Am 24.01.26 um 15:45 schrieb TINSAE TADESSE:
> > > >
> > > >> On Fri, Jan 16, 2026 at 9:24 AM Guenter Roeck <linux@...ck-us.net <mailto:linux@...ck-us.net>> wrote:
> > > >>> On 1/15/26 05:50, TINSAE TADESSE wrote:
> > > >>>> On Wed, Jan 14, 2026 at 5:23 PM Guenter Roeck <linux@...ck-us.net <mailto:linux@...ck-us.net>> wrote:
> > > >>>>> On 1/14/26 05:07, TINSAE TADESSE wrote:
> > > >>>>> ...
> > > >>>>>>>> Hi Guenter,
> > > >>>>>>>>
> > > >>>>>>>> I tested changing the i801 SMBus controller to use
> > > >>>>>>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() instead of
> > > >>>>>>>> DEFINE_SIMPLE_DEV_PM_OPS() as a diagnostic experiment. With this
> > > >>>>>>>> change, spd5118 resume failures (-ENXIO)
> > > >>>>>>>> still persist, suggesting PM ordering alone is insufficient and other
> > > >>>>>>>> firmware interactions are involved.
> > > >>>>>>> How about the problem in the suspend function ? Is that also still seen ?
> > > >>>>>>>
> > > >>>>>>> Also, the subject talks about -EIO. Is that still seen ?
> > > >>>>>>>
> > > >>>>>>> Either case, can you enable debug logs for the i801 driver ?
> > > >>>>>>> It should generate log entries when it reports errors.
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Guenter
> > > >>>>>>>
> > > >>>>>> Hi Guenter,
> > > >>>>>>
> > > >>>>>> Thank you for the questions. To clarify:
> > > >>>>>>
> > > >>>>> Please do not drop mailing lists from replies.
> > > >>>>>
> > > >>>>>> 1) I have not observed any failures in the suspend path. The suspend
> > > >>>>>> callback completes successfully, and
> > > >>>>>> I have not seen I2C errors or warnings during suspend at any point.
> > > >>>>> Sorry, I seem to be missing something.
> > > >>>>>
> > > >>>>> In that case, what is the point of patch 3/3 of your series which
> > > >>>>> removes hardware accesses from the suspend function ?
> > > >>>>>
> > > >>>>>> 2) I have also not observed -EIO in my testing. The error consistently
> > > >>>>>> reported on resume and subsequent hwmon access is -ENXIO.
> > > >>>>>> Earlier references to -EIO were based on assumptions rather than
> > > >>>>>> observed logs, and I should have been clearer about that.
> > > >>>>>>
> > > >>>>> Thanks for the clarification.
> > > >>>>>
> > > >>>>> Guenter
> > > >>>>>
> > > >>>>>> I am enabling debug logging for the i801 driver to collect more
> > > >>>>>> concrete evidence of controller state during resume.
> > > >>>> Hi Guenter,
> > > >>>>
> > > >>>>> Sorry, I seem to be missing something.
> > > >>>>>
> > > >>>>> In that case, what is the point of patch 3/3 of your series which
> > > >>>>> removes hardware accesses from the suspend function ?
> > > >>>> You are right to question this, and I agree that it needs clarification.
> > > >>>>
> > > >>>> Patch 3/3 was originally proposed under the assumption that the resume failures
> > > >>>> were caused by spd5118 performing I2C transactions while the
> > > >>>> controller was not yet available,
> > > >>>> and that removing hardware accesses from the suspend path might
> > > >>>> mitigate the issue.
> > > >>>> At that point, I assumed the problem was limited to the resume callback.
> > > >>>>
> > > >>>> After enabling detailed i801 debug logging and testing with
> > > >>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() in the i801 driver,
> > > >>>> it became clear that this assumption was incorrect. The controller
> > > >>>> itself reports "i801_smbus: No response"
> > > >>>> both during suspend and immediately after resume, and spd5118 merely
> > > >>>> propagates the resulting -ENXIO.
> > > >>> Outch, that really hurts, because it means that something is seriously
> > > >>> broken in both the suspend and resume path. The device _must_ be accessible
> > > >>> in the suspend path. Otherwise there is no guarantee that the device is
> > > >>> accessible for normal (pre-suspend) operation. After all, someone could
> > > >>> run a script reading sysfs attributes in a tight loop continuously,
> > > >>> or the thermal subsystem could try to access the chip. That would suddenly
> > > >>> start to fail if something in the device access path starts to be suspended
> > > >>> while the underlying hardware is still believed to be operational.
> > > >>>
> > > >>> I could imagine some hack/quirk for the resume path, such as delaying resume
> > > >>> for some period of time for affected hardware, but I have no idea what to
> > > >>> do on the suspend side. We can not just drop device writes during suspend
> > > >>> because some broken hardware/firmware does not let us actually access
> > > >>> (and thus suspend) the hardware anymore by the time the suspend function
> > > >>> is called.
> > > >>>
> > > >>> Guenter
> > > >>>
> > > >>>> This indicates that the issue is not caused by spd5118 suspend/resume
> > > >>>> behavior, but by the unavailability of the
> > > >>>> SMBus controller due to platform or firmware interactions during
> > > >>>> s2idle transitions.
> > > >>>>
> > > >>>> Given this, I agree that patch 3/3 does not address the root cause and
> > > >>>> does not provide a justified improvement.
> > > >>>> I am therefore fine with dropping it.
> > > >>>>
> > > >>>> Thank you for pointing this out.
> > > >>>>
> > > >> Hi Guenter,
> > > >>
> > > >> Thanks for the continued review and for questioning the earlier
> > > >> direction — that helped narrow this down properly.
> > > >>
> > > >> After enabling full i801 debug logging (included below in this email)
> > > >> and inspecting both drivers, it became clear that the resume
> > > >> failures are not caused by spd5118 accessing the hardware too
> > > >> early, nor by PM ordering issues. Instead, the SMBus controller
> > > >> explicitly reports “SPD Write Disable is set”, and any
> > > >> block write transactions to the SPD device consistently fail with
> > > >> DEV_ERR. spd5118 merely propagates the resulting -ENXIO.
> > > >
> > > > Oh no, this likely happens even when merely reading values, as the spd5118
> > > > uses a page register to switch between different register pages. In order
> > > > to access temperature data (page 0), you might already have to issue a
> > > > write access to the page register. The only reason why it works for you
> > > > is that the spd5118 likely already has page 0 selected by the system firmware
> > > > during boot.
> > > >
> > >
> > > Exactly. There is no guarantee that page 0 is selected.
> > >
> > > >> With that in mind, I have dropped the earlier patch that attempted
> > > >> to remove hardware access from the suspend path
> > > >> unconditionally.
> > > >> That patch does not address the root cause and is no longer
> > > >> part of the series.
> > > >>
> > > >> I am instead proposing a minimal 2-patch series:
> > > >>
> > > >> 1/2 records whether the platform enforces SPD write disable at probe
> > > >> time (no behavior change).
> > > >> 2/2 avoids regcache writeback during suspend/resume when the device
> > > >> operates in read-only mode, while still allowing read access to
> > > >> temperature inputs.
> > > >>
> > > >> This avoids issuing SMBus transactions that are architecturally
> > > >> blocked on these systems, and does not rely on
> > > >> delays or PM ordering assumptions, and leaves behavior unchanged on
> > > >> platforms where SPD writes are permitted.
> > > >>
> > > >> If this direction looks acceptable, I’m happy to re-spin and post the
> > > >> series formally.
> > > >>
> > > >> Thanks again for the guidance.
> > > >
> > > > I do not know if this is a reliable solution, as the system firmware might
> > > > select a different register page during resume. This will then prevent the
> > > > driver from functioning.
> > > >
> > >
> > > No, it is not reliable. The driver is simply not usable in this scenario.
> > > This isn't just the temperature sensor code - the eeprom code is affected
> > > as well.
> > >
> > > > I would love to see the spd5118 driver working on such systems with reduced
> > > > functionality, but i will leave it to Guenter to decide if this approach is
> > > > maintainable.
> > > >
> > > > Besides that: did the spd5118 driver load automatically on your device?
> > > >
> > >
> > > I thought that was disabled. The i801 driver is supposed to detect if write
> > > protect is enabled and, if so, it is supposed to not instantiate the spd5118
> > > driver for DDR3. Support for this was added with commit 4d6d35d3417d ("i2c:
> > > smbus: introduce Write Disable-aware SPD instantiating functions"). Apparently
> > > the code to do this never made it into the i801 driver.
> > >
> > > The i801 driver needs to be fixed to inform the spd initialization code
> > > that the spd5118 address range is write protected. The patch to do this was
> > > "i2c: i801: Do not instantiate spd5118 under SPD Write Disable". I have no idea
> > > why that patch didn't make it upstream.
> > >
> > > Guenter
> > >
> >
> > Hi Guenter,
> >
> > > > The i801 driver needs to be fixed to inform the spd initialization code
> > > > that the spd5118 address range is write protected. The patch to do
> > > > this was
> > > > "i2c: i801: Do not instantiate spd5118 under SPD Write Disable". I
> > > > have no idea
> > > > why that patch didn't make it upstream.
> >
> > I initially considered exposing SPD write-protection as a capability to be consumed by spd5118.
> > However, spd5118 already depends on firmware-initialized state (page selection) and cannot reliably
> > determine safe operation at probe time without issuing writes.
> > Given that, suppressing SPD instantiation in the i801 driver when SPD Write Disable is
> > the only solution here.
> >
> > Tested the remaining patch [1], and it does not seem to fix the issue, as it is added at the wrong stage
> > in the initialization of the i801 driver. The attached log shows that spd5118 is instantiated and fully probed
> > before the "do not instantiate spd5118 under SPD Write Disable" is acted upon.
> > This confirms that suppressing instantiation must occur before adapter registration in i801.
> > Once the adapter is registered, the SPD scan has already happened, and spd5118 has bound successfully
> > based on the firmware-initialized state, even though correct operation is impossible.
> > Therefore, the fix must be run prior to adapter registration.
> >
> > [1] https://lore.kernel.org/all/20250430-for-upstream-i801-spd5118-no-instantiate-v2-2-2f54d91ae2c7@canonical.com/ <https://lore.kernel.org/all/20250430-for-upstream-i801-spd5118-no-instantiate-v2-2-2f54d91ae2c7@canonical.com/>
> >
> >
> > The fix is to register the SPD write-disable policy before i2c_add_adapter(), so the I2C core never
> > probes SPD addresses on write-protected platforms.
> >
>
> Do you have SENSORS_SPD5118_DETECT enabled in your configuration ? It should be disabled
> on systems with DMI enabled.
>
> Guenter
>
Hello Guenter,
Disabling CONFIG_SENSORS_SPD5118_DETECT completely avoids the issue on
affected platforms,
even without any code changes. This confirms that the failures are
triggered specifically by automatic
SPD5118 instantiation on systems where the i801 controller enforces
SPD Write Disable.
The existing Kconfig already documents that auto-detection is optional
and that, on x86, instantiation
is expected to be coordinated by the SMBus subsystem.
The fact that disabling auto-detection avoids the issue is evidence
that instantiation itself is
unsafe under SPD Write Disable, not that spd5118 is malfunctioning.
The existing Kconfig help already says "Disabling the detect function
will speed up boot time and reduce
the risk of mis-detecting…", and the attached patch documents the
reason why mis-detection can happen.
View attachment "0001-hwmon-spd5118-Document-detect-limitations-under-SPD-.patch" of type "text/x-patch" (1544 bytes)
Powered by blists - more mailing lists