[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6249756d-7e49-464e-bb7e-11dfba3085f3@roeck-us.net>
Date: Tue, 27 Jan 2026 06:33:34 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: TINSAE TADESSE <tinsaetadesse2015@...il.com>
Cc: Armin Wolf <W_Armin@....de>,
"linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] hwmon: spd5118: Do not fail resume on temporary I2C
errors
On 1/27/26 02:35, TINSAE TADESSE wrote:
>
>
> On Mon, Jan 26, 2026 at 1:36 AM Guenter Roeck <linux@...ck-us.net <mailto:linux@...ck-us.net>> wrote:
> >
> > On 1/24/26 11:11, Armin Wolf wrote:
> > > Am 24.01.26 um 15:45 schrieb TINSAE TADESSE:
> > >
> > >> On Fri, Jan 16, 2026 at 9:24 AM Guenter Roeck <linux@...ck-us.net <mailto:linux@...ck-us.net>> wrote:
> > >>> On 1/15/26 05:50, TINSAE TADESSE wrote:
> > >>>> On Wed, Jan 14, 2026 at 5:23 PM Guenter Roeck <linux@...ck-us.net <mailto:linux@...ck-us.net>> wrote:
> > >>>>> On 1/14/26 05:07, TINSAE TADESSE wrote:
> > >>>>> ...
> > >>>>>>>> Hi Guenter,
> > >>>>>>>>
> > >>>>>>>> I tested changing the i801 SMBus controller to use
> > >>>>>>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() instead of
> > >>>>>>>> DEFINE_SIMPLE_DEV_PM_OPS() as a diagnostic experiment. With this
> > >>>>>>>> change, spd5118 resume failures (-ENXIO)
> > >>>>>>>> still persist, suggesting PM ordering alone is insufficient and other
> > >>>>>>>> firmware interactions are involved.
> > >>>>>>> How about the problem in the suspend function ? Is that also still seen ?
> > >>>>>>>
> > >>>>>>> Also, the subject talks about -EIO. Is that still seen ?
> > >>>>>>>
> > >>>>>>> Either case, can you enable debug logs for the i801 driver ?
> > >>>>>>> It should generate log entries when it reports errors.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Guenter
> > >>>>>>>
> > >>>>>> Hi Guenter,
> > >>>>>>
> > >>>>>> Thank you for the questions. To clarify:
> > >>>>>>
> > >>>>> Please do not drop mailing lists from replies.
> > >>>>>
> > >>>>>> 1) I have not observed any failures in the suspend path. The suspend
> > >>>>>> callback completes successfully, and
> > >>>>>> I have not seen I2C errors or warnings during suspend at any point.
> > >>>>> Sorry, I seem to be missing something.
> > >>>>>
> > >>>>> In that case, what is the point of patch 3/3 of your series which
> > >>>>> removes hardware accesses from the suspend function ?
> > >>>>>
> > >>>>>> 2) I have also not observed -EIO in my testing. The error consistently
> > >>>>>> reported on resume and subsequent hwmon access is -ENXIO.
> > >>>>>> Earlier references to -EIO were based on assumptions rather than
> > >>>>>> observed logs, and I should have been clearer about that.
> > >>>>>>
> > >>>>> Thanks for the clarification.
> > >>>>>
> > >>>>> Guenter
> > >>>>>
> > >>>>>> I am enabling debug logging for the i801 driver to collect more
> > >>>>>> concrete evidence of controller state during resume.
> > >>>> Hi Guenter,
> > >>>>
> > >>>>> Sorry, I seem to be missing something.
> > >>>>>
> > >>>>> In that case, what is the point of patch 3/3 of your series which
> > >>>>> removes hardware accesses from the suspend function ?
> > >>>> You are right to question this, and I agree that it needs clarification.
> > >>>>
> > >>>> Patch 3/3 was originally proposed under the assumption that the resume failures
> > >>>> were caused by spd5118 performing I2C transactions while the
> > >>>> controller was not yet available,
> > >>>> and that removing hardware accesses from the suspend path might
> > >>>> mitigate the issue.
> > >>>> At that point, I assumed the problem was limited to the resume callback.
> > >>>>
> > >>>> After enabling detailed i801 debug logging and testing with
> > >>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() in the i801 driver,
> > >>>> it became clear that this assumption was incorrect. The controller
> > >>>> itself reports "i801_smbus: No response"
> > >>>> both during suspend and immediately after resume, and spd5118 merely
> > >>>> propagates the resulting -ENXIO.
> > >>> Outch, that really hurts, because it means that something is seriously
> > >>> broken in both the suspend and resume path. The device _must_ be accessible
> > >>> in the suspend path. Otherwise there is no guarantee that the device is
> > >>> accessible for normal (pre-suspend) operation. After all, someone could
> > >>> run a script reading sysfs attributes in a tight loop continuously,
> > >>> or the thermal subsystem could try to access the chip. That would suddenly
> > >>> start to fail if something in the device access path starts to be suspended
> > >>> while the underlying hardware is still believed to be operational.
> > >>>
> > >>> I could imagine some hack/quirk for the resume path, such as delaying resume
> > >>> for some period of time for affected hardware, but I have no idea what to
> > >>> do on the suspend side. We can not just drop device writes during suspend
> > >>> because some broken hardware/firmware does not let us actually access
> > >>> (and thus suspend) the hardware anymore by the time the suspend function
> > >>> is called.
> > >>>
> > >>> Guenter
> > >>>
> > >>>> This indicates that the issue is not caused by spd5118 suspend/resume
> > >>>> behavior, but by the unavailability of the
> > >>>> SMBus controller due to platform or firmware interactions during
> > >>>> s2idle transitions.
> > >>>>
> > >>>> Given this, I agree that patch 3/3 does not address the root cause and
> > >>>> does not provide a justified improvement.
> > >>>> I am therefore fine with dropping it.
> > >>>>
> > >>>> Thank you for pointing this out.
> > >>>>
> > >> Hi Guenter,
> > >>
> > >> Thanks for the continued review and for questioning the earlier
> > >> direction — that helped narrow this down properly.
> > >>
> > >> After enabling full i801 debug logging (included below in this email)
> > >> and inspecting both drivers, it became clear that the resume
> > >> failures are not caused by spd5118 accessing the hardware too
> > >> early, nor by PM ordering issues. Instead, the SMBus controller
> > >> explicitly reports “SPD Write Disable is set”, and any
> > >> block write transactions to the SPD device consistently fail with
> > >> DEV_ERR. spd5118 merely propagates the resulting -ENXIO.
> > >
> > > Oh no, this likely happens even when merely reading values, as the spd5118
> > > uses a page register to switch between different register pages. In order
> > > to access temperature data (page 0), you might already have to issue a
> > > write access to the page register. The only reason why it works for you
> > > is that the spd5118 likely already has page 0 selected by the system firmware
> > > during boot.
> > >
> >
> > Exactly. There is no guarantee that page 0 is selected.
> >
> > >> With that in mind, I have dropped the earlier patch that attempted
> > >> to remove hardware access from the suspend path
> > >> unconditionally.
> > >> That patch does not address the root cause and is no longer
> > >> part of the series.
> > >>
> > >> I am instead proposing a minimal 2-patch series:
> > >>
> > >> 1/2 records whether the platform enforces SPD write disable at probe
> > >> time (no behavior change).
> > >> 2/2 avoids regcache writeback during suspend/resume when the device
> > >> operates in read-only mode, while still allowing read access to
> > >> temperature inputs.
> > >>
> > >> This avoids issuing SMBus transactions that are architecturally
> > >> blocked on these systems, and does not rely on
> > >> delays or PM ordering assumptions, and leaves behavior unchanged on
> > >> platforms where SPD writes are permitted.
> > >>
> > >> If this direction looks acceptable, I’m happy to re-spin and post the
> > >> series formally.
> > >>
> > >> Thanks again for the guidance.
> > >
> > > I do not know if this is a reliable solution, as the system firmware might
> > > select a different register page during resume. This will then prevent the
> > > driver from functioning.
> > >
> >
> > No, it is not reliable. The driver is simply not usable in this scenario.
> > This isn't just the temperature sensor code - the eeprom code is affected
> > as well.
> >
> > > I would love to see the spd5118 driver working on such systems with reduced
> > > functionality, but i will leave it to Guenter to decide if this approach is
> > > maintainable.
> > >
> > > Besides that: did the spd5118 driver load automatically on your device?
> > >
> >
> > I thought that was disabled. The i801 driver is supposed to detect if write
> > protect is enabled and, if so, it is supposed to not instantiate the spd5118
> > driver for DDR3. Support for this was added with commit 4d6d35d3417d ("i2c:
> > smbus: introduce Write Disable-aware SPD instantiating functions"). Apparently
> > the code to do this never made it into the i801 driver.
> >
> > The i801 driver needs to be fixed to inform the spd initialization code
> > that the spd5118 address range is write protected. The patch to do this was
> > "i2c: i801: Do not instantiate spd5118 under SPD Write Disable". I have no idea
> > why that patch didn't make it upstream.
> >
> > Guenter
> >
>
> Hi Guenter,
>
> > > The i801 driver needs to be fixed to inform the spd initialization code
> > > that the spd5118 address range is write protected. The patch to do
> > > this was
> > > "i2c: i801: Do not instantiate spd5118 under SPD Write Disable". I
> > > have no idea
> > > why that patch didn't make it upstream.
>
> I initially considered exposing SPD write-protection as a capability to be consumed by spd5118.
> However, spd5118 already depends on firmware-initialized state (page selection) and cannot reliably
> determine safe operation at probe time without issuing writes.
> Given that, suppressing SPD instantiation in the i801 driver when SPD Write Disable is
> the only solution here.
>
> Tested the remaining patch [1], and it does not seem to fix the issue, as it is added at the wrong stage
> in the initialization of the i801 driver. The attached log shows that spd5118 is instantiated and fully probed
> before the "do not instantiate spd5118 under SPD Write Disable" is acted upon.
> This confirms that suppressing instantiation must occur before adapter registration in i801.
> Once the adapter is registered, the SPD scan has already happened, and spd5118 has bound successfully
> based on the firmware-initialized state, even though correct operation is impossible.
> Therefore, the fix must be run prior to adapter registration.
>
> [1] https://lore.kernel.org/all/20250430-for-upstream-i801-spd5118-no-instantiate-v2-2-2f54d91ae2c7@canonical.com/ <https://lore.kernel.org/all/20250430-for-upstream-i801-spd5118-no-instantiate-v2-2-2f54d91ae2c7@canonical.com/>
>
>
> The fix is to register the SPD write-disable policy before i2c_add_adapter(), so the I2C core never
> probes SPD addresses on write-protected platforms.
>
Do you have SENSORS_SPD5118_DETECT enabled in your configuration ? It should be disabled
on systems with DMI enabled.
Guenter
Powered by blists - more mailing lists