lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6249756d-7e49-464e-bb7e-11dfba3085f3@roeck-us.net>
Date: Tue, 27 Jan 2026 06:33:34 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: TINSAE TADESSE <tinsaetadesse2015@...il.com>
Cc: Armin Wolf <W_Armin@....de>,
 "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] hwmon: spd5118: Do not fail resume on temporary I2C
 errors

On 1/27/26 02:35, TINSAE TADESSE wrote:
> 
> 
> On Mon, Jan 26, 2026 at 1:36 AM Guenter Roeck <linux@...ck-us.net <mailto:linux@...ck-us.net>> wrote:
>  >
>  > On 1/24/26 11:11, Armin Wolf wrote:
>  > > Am 24.01.26 um 15:45 schrieb TINSAE TADESSE:
>  > >
>  > >> On Fri, Jan 16, 2026 at 9:24 AM Guenter Roeck <linux@...ck-us.net <mailto:linux@...ck-us.net>> wrote:
>  > >>> On 1/15/26 05:50, TINSAE TADESSE wrote:
>  > >>>> On Wed, Jan 14, 2026 at 5:23 PM Guenter Roeck <linux@...ck-us.net <mailto:linux@...ck-us.net>> wrote:
>  > >>>>> On 1/14/26 05:07, TINSAE TADESSE wrote:
>  > >>>>> ...
>  > >>>>>>>> Hi Guenter,
>  > >>>>>>>>
>  > >>>>>>>> I tested changing the i801 SMBus controller to use
>  > >>>>>>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() instead of
>  > >>>>>>>> DEFINE_SIMPLE_DEV_PM_OPS() as a diagnostic experiment. With this
>  > >>>>>>>> change, spd5118 resume failures (-ENXIO)
>  > >>>>>>>> still persist, suggesting PM ordering alone is insufficient and other
>  > >>>>>>>> firmware interactions are involved.
>  > >>>>>>> How about the problem in the suspend function ? Is that also still seen ?
>  > >>>>>>>
>  > >>>>>>> Also, the subject talks about -EIO. Is that still seen ?
>  > >>>>>>>
>  > >>>>>>> Either case, can you enable debug logs for the i801 driver ?
>  > >>>>>>> It should generate log entries when it reports errors.
>  > >>>>>>>
>  > >>>>>>> Thanks,
>  > >>>>>>> Guenter
>  > >>>>>>>
>  > >>>>>> Hi Guenter,
>  > >>>>>>
>  > >>>>>> Thank you for the questions. To clarify:
>  > >>>>>>
>  > >>>>> Please do not drop mailing lists from replies.
>  > >>>>>
>  > >>>>>> 1) I have not observed any failures in the suspend path. The suspend
>  > >>>>>> callback completes successfully, and
>  > >>>>>> I have not seen I2C errors or warnings during suspend at any point.
>  > >>>>> Sorry, I seem to be missing something.
>  > >>>>>
>  > >>>>> In that case, what is the point of patch 3/3 of your series which
>  > >>>>> removes hardware accesses from the suspend function ?
>  > >>>>>
>  > >>>>>> 2) I have also not observed -EIO in my testing. The error consistently
>  > >>>>>> reported on resume and subsequent hwmon access is -ENXIO.
>  > >>>>>> Earlier references to -EIO were based on assumptions rather than
>  > >>>>>> observed logs, and I should have been clearer about that.
>  > >>>>>>
>  > >>>>> Thanks for the clarification.
>  > >>>>>
>  > >>>>> Guenter
>  > >>>>>
>  > >>>>>> I am enabling debug logging for the i801 driver to collect more
>  > >>>>>> concrete evidence of controller state during resume.
>  > >>>> Hi Guenter,
>  > >>>>
>  > >>>>> Sorry, I seem to be missing something.
>  > >>>>>
>  > >>>>> In that case, what is the point of patch 3/3 of your series which
>  > >>>>> removes hardware accesses from the suspend function ?
>  > >>>> You are right to question this, and I agree that it needs clarification.
>  > >>>>
>  > >>>> Patch 3/3 was originally proposed under the assumption that the resume failures
>  > >>>> were caused by spd5118 performing I2C transactions while the
>  > >>>> controller was not yet available,
>  > >>>> and that removing hardware accesses from the suspend path might
>  > >>>> mitigate the issue.
>  > >>>> At that point, I assumed the problem was limited to the resume callback.
>  > >>>>
>  > >>>> After enabling detailed i801 debug logging and testing with
>  > >>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() in the i801 driver,
>  > >>>> it became clear that this assumption was incorrect. The controller
>  > >>>> itself reports "i801_smbus: No response"
>  > >>>> both during suspend and immediately after resume, and spd5118 merely
>  > >>>> propagates the resulting -ENXIO.
>  > >>> Outch, that really hurts, because it means that something is seriously
>  > >>> broken in both the suspend and resume path. The device _must_ be accessible
>  > >>> in the suspend path. Otherwise there is no guarantee that the device is
>  > >>> accessible for normal (pre-suspend) operation. After all, someone could
>  > >>> run a script reading sysfs attributes in a tight loop continuously,
>  > >>> or the thermal subsystem could try to access the chip. That would suddenly
>  > >>> start to fail if something in the device access path starts to be suspended
>  > >>> while the underlying hardware is still believed to be operational.
>  > >>>
>  > >>> I could imagine some hack/quirk for the resume path, such as delaying resume
>  > >>> for some period of time for affected hardware, but I have no idea what to
>  > >>> do on the suspend side. We can not just drop device writes during suspend
>  > >>> because some broken hardware/firmware does not let us actually access
>  > >>> (and thus suspend) the hardware anymore by the time the suspend function
>  > >>> is called.
>  > >>>
>  > >>> Guenter
>  > >>>
>  > >>>> This indicates that the issue is not caused by spd5118 suspend/resume
>  > >>>> behavior, but by the unavailability of the
>  > >>>> SMBus controller due to platform or firmware interactions during
>  > >>>> s2idle transitions.
>  > >>>>
>  > >>>> Given this, I agree that patch 3/3 does not address the root cause and
>  > >>>> does not provide a justified improvement.
>  > >>>> I am therefore fine with dropping it.
>  > >>>>
>  > >>>> Thank you for pointing this out.
>  > >>>>
>  > >> Hi Guenter,
>  > >>
>  > >> Thanks for the continued review and for questioning the earlier
>  > >> direction — that helped narrow this down properly.
>  > >>
>  > >> After enabling full i801 debug logging (included below in this email)
>  > >> and inspecting both drivers, it became clear that the resume
>  > >> failures are not caused by spd5118 accessing the hardware too
>  > >> early, nor by PM ordering issues. Instead, the SMBus controller
>  > >> explicitly reports “SPD Write Disable is set”, and any
>  > >> block write transactions to the SPD device consistently fail with
>  > >> DEV_ERR. spd5118 merely propagates the resulting -ENXIO.
>  > >
>  > > Oh no, this likely happens even when merely reading values, as the spd5118
>  > > uses a page register to switch between different register pages. In order
>  > > to access temperature data (page 0), you might already have to issue a
>  > > write access to the page register. The only reason why it works for you
>  > > is that the spd5118 likely already has page 0 selected by the system firmware
>  > > during boot.
>  > >
>  >
>  > Exactly. There is no guarantee that page 0 is selected.
>  >
>  > >> With that in mind, I have dropped the earlier patch that attempted
>  > >> to remove hardware access from the suspend path
>  > >> unconditionally.
>  > >> That patch does not address the root cause and is no longer
>  > >> part of the series.
>  > >>
>  > >> I am instead proposing a minimal 2-patch series:
>  > >>
>  > >> 1/2 records whether the platform enforces SPD write disable at probe
>  > >> time (no behavior change).
>  > >> 2/2 avoids regcache writeback during suspend/resume when the device
>  > >> operates in read-only mode, while still allowing read access to
>  > >> temperature inputs.
>  > >>
>  > >> This avoids issuing SMBus transactions that are architecturally
>  > >> blocked on these systems, and does not rely on
>  > >> delays or PM ordering assumptions, and leaves behavior unchanged on
>  > >> platforms where SPD writes are permitted.
>  > >>
>  > >> If this direction looks acceptable, I’m happy to re-spin and post the
>  > >> series formally.
>  > >>
>  > >> Thanks again for the guidance.
>  > >
>  > > I do not know if this is a reliable solution, as the system firmware might
>  > > select a different register page during resume. This will then prevent the
>  > > driver from functioning.
>  > >
>  >
>  > No, it is not reliable. The driver is simply not usable in this scenario.
>  > This isn't just the temperature sensor code - the eeprom code is affected
>  > as well.
>  >
>  > > I would love to see the spd5118 driver working on such systems with reduced
>  > > functionality, but i will leave it to Guenter to decide if this approach is
>  > > maintainable.
>  > >
>  > > Besides that: did the spd5118 driver load automatically on your device?
>  > >
>  >
>  > I thought that was disabled. The i801 driver is supposed to detect if write
>  > protect is enabled and, if so, it is supposed to not instantiate the spd5118
>  > driver for DDR3. Support for this was added with commit 4d6d35d3417d ("i2c:
>  > smbus: introduce Write Disable-aware SPD instantiating functions"). Apparently
>  > the code to do this never made it into the i801 driver.
>  >
>  > The i801 driver needs to be fixed to inform the spd initialization code
>  > that the spd5118 address range is write protected. The patch to do this was
>  > "i2c: i801: Do not instantiate spd5118 under SPD Write Disable". I have no idea
>  > why that patch didn't make it upstream.
>  >
>  > Guenter
>  >
> 
> Hi Guenter,
> 
>  > > The i801 driver needs to be fixed to inform the spd initialization code
>  > > that the spd5118 address range is write protected. The patch to do
>  > > this was
>  > > "i2c: i801: Do not instantiate spd5118 under SPD Write Disable". I
>  > > have no idea
>  > > why that patch didn't make it upstream.
> 
> I initially considered exposing SPD write-protection as a capability to be consumed by spd5118.
> However, spd5118 already depends on firmware-initialized state (page selection) and cannot reliably
> determine safe operation at probe time without issuing writes.
> Given that, suppressing SPD instantiation in the i801 driver when SPD Write Disable is
> the only solution here.
> 
> Tested the remaining patch [1], and it does not seem to fix the issue, as it is added at the wrong stage
> in the initialization of the i801 driver. The attached log shows that spd5118 is instantiated and fully probed
> before the "do not instantiate spd5118 under SPD Write Disable" is acted upon.
> This confirms that suppressing instantiation must occur before adapter registration in i801.
> Once the adapter is registered, the SPD scan has already happened, and spd5118 has bound successfully
> based on the firmware-initialized state, even though correct operation is impossible.
> Therefore, the fix must be run prior to adapter registration.
> 
> [1] https://lore.kernel.org/all/20250430-for-upstream-i801-spd5118-no-instantiate-v2-2-2f54d91ae2c7@canonical.com/ <https://lore.kernel.org/all/20250430-for-upstream-i801-spd5118-no-instantiate-v2-2-2f54d91ae2c7@canonical.com/>
> 
> 
> The fix is to register the SPD write-disable policy before i2c_add_adapter(), so the I2C core never
> probes SPD addresses on write-protected platforms.
> 

Do you have SENSORS_SPD5118_DETECT enabled in your configuration ? It should be disabled
on systems with DMI enabled.

Guenter


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ