lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <995092ad-7b99-4edc-b0eb-b4a3d3f5b1fd@gmx.de>
Date: Mon, 26 Jan 2026 10:40:51 +0100
From: Armin Wolf <W_Armin@....de>
To: Guenter Roeck <linux@...ck-us.net>,
 TINSAE TADESSE <tinsaetadesse2015@...il.com>
Cc: "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] hwmon: spd5118: Do not fail resume on temporary I2C
 errors

Am 25.01.26 um 23:36 schrieb Guenter Roeck:

> On 1/24/26 11:11, Armin Wolf wrote:
>> Am 24.01.26 um 15:45 schrieb TINSAE TADESSE:
>>
>>> On Fri, Jan 16, 2026 at 9:24 AM Guenter Roeck <linux@...ck-us.net> 
>>> wrote:
>>>> On 1/15/26 05:50, TINSAE TADESSE wrote:
>>>>> On Wed, Jan 14, 2026 at 5:23 PM Guenter Roeck <linux@...ck-us.net> 
>>>>> wrote:
>>>>>> On 1/14/26 05:07, TINSAE TADESSE wrote:
>>>>>> ...
>>>>>>>>> Hi Guenter,
>>>>>>>>>
>>>>>>>>> I tested changing the i801 SMBus controller to use
>>>>>>>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() instead of
>>>>>>>>> DEFINE_SIMPLE_DEV_PM_OPS() as a diagnostic experiment. With this
>>>>>>>>> change, spd5118 resume failures (-ENXIO)
>>>>>>>>> still persist, suggesting PM ordering alone is insufficient 
>>>>>>>>> and other
>>>>>>>>> firmware interactions are involved.
>>>>>>>> How about the problem in the suspend function ? Is that also 
>>>>>>>> still seen ?
>>>>>>>>
>>>>>>>> Also, the subject talks about -EIO. Is that still seen ?
>>>>>>>>
>>>>>>>> Either case, can you enable debug logs for the i801 driver ?
>>>>>>>> It should generate log entries when it reports errors.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Guenter
>>>>>>>>
>>>>>>> Hi Guenter,
>>>>>>>
>>>>>>> Thank you for the questions. To clarify:
>>>>>>>
>>>>>> Please do not drop mailing lists from replies.
>>>>>>
>>>>>>> 1) I have not observed any failures in the suspend path. The 
>>>>>>> suspend
>>>>>>> callback completes successfully, and
>>>>>>> I have not seen I2C errors or warnings during suspend at any point.
>>>>>> Sorry, I seem to be missing something.
>>>>>>
>>>>>> In that case, what is the point of patch 3/3 of your series which
>>>>>> removes hardware accesses from the suspend function ?
>>>>>>
>>>>>>> 2) I have also not observed -EIO in my testing. The error 
>>>>>>> consistently
>>>>>>> reported on resume and subsequent hwmon access is -ENXIO.
>>>>>>> Earlier references to -EIO were based on assumptions rather than
>>>>>>> observed logs, and I should have been clearer about that.
>>>>>>>
>>>>>> Thanks for the clarification.
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>>>> I am enabling debug logging for the i801 driver to collect more
>>>>>>> concrete evidence of controller state during resume.
>>>>> Hi Guenter,
>>>>>
>>>>>> Sorry, I seem to be missing something.
>>>>>>
>>>>>> In that case, what is the point of patch 3/3 of your series which
>>>>>> removes hardware accesses from the suspend function ?
>>>>> You are right to question this, and I agree that it needs 
>>>>> clarification.
>>>>>
>>>>> Patch 3/3 was originally proposed under the assumption that the 
>>>>> resume failures
>>>>> were caused by spd5118 performing I2C transactions while the
>>>>> controller was not yet available,
>>>>> and that removing hardware accesses from the suspend path might
>>>>> mitigate the issue.
>>>>> At that point, I assumed the problem was limited to the resume 
>>>>> callback.
>>>>>
>>>>> After enabling detailed i801 debug logging and testing with
>>>>> SET_LATE_SYSTEM_SLEEP_PM_OPS() in the i801 driver,
>>>>> it became clear that this assumption was incorrect. The controller
>>>>> itself reports "i801_smbus: No response"
>>>>> both during suspend and immediately after resume, and spd5118 merely
>>>>> propagates the resulting -ENXIO.
>>>> Outch, that really hurts, because it means that something is seriously
>>>> broken in both the suspend and resume path. The device _must_ be 
>>>> accessible
>>>> in the suspend path. Otherwise there is no guarantee that the 
>>>> device is
>>>> accessible for normal (pre-suspend) operation. After all, someone 
>>>> could
>>>> run a script reading sysfs attributes in a tight loop continuously,
>>>> or the thermal subsystem could try to access the chip. That would 
>>>> suddenly
>>>> start to fail if something in the device access path starts to be 
>>>> suspended
>>>> while the underlying hardware is still believed to be operational.
>>>>
>>>> I could imagine some hack/quirk for the resume path, such as 
>>>> delaying resume
>>>> for some period of time for affected hardware, but I have no idea 
>>>> what to
>>>> do on the suspend side. We can not just drop device writes during 
>>>> suspend
>>>> because some broken hardware/firmware does not let us actually access
>>>> (and thus suspend) the hardware anymore by the time the suspend 
>>>> function
>>>> is called.
>>>>
>>>> Guenter
>>>>
>>>>> This indicates that the issue is not caused by spd5118 suspend/resume
>>>>> behavior, but by the unavailability of the
>>>>> SMBus controller due to platform or firmware interactions during
>>>>> s2idle transitions.
>>>>>
>>>>> Given this, I agree that patch 3/3 does not address the root cause 
>>>>> and
>>>>> does not provide a justified improvement.
>>>>> I am therefore fine with dropping it.
>>>>>
>>>>> Thank you for pointing this out.
>>>>>
>>> Hi Guenter,
>>>
>>> Thanks for the continued review and for questioning the earlier
>>> direction — that helped narrow this down properly.
>>>
>>> After enabling full i801 debug logging (included below in this email)
>>> and inspecting both drivers, it became clear that the resume
>>> failures are not caused by spd5118 accessing the hardware too
>>> early, nor by PM ordering issues. Instead, the SMBus controller
>>> explicitly reports “SPD Write Disable is set”, and any
>>> block write transactions to the SPD device consistently fail with
>>> DEV_ERR. spd5118 merely propagates the resulting -ENXIO.
>>
>> Oh no, this likely happens even when merely reading values, as the 
>> spd5118
>> uses a page register to switch between different register pages. In 
>> order
>> to access temperature data (page 0), you might already have to issue a
>> write access to the page register. The only reason why it works for you
>> is that the spd5118 likely already has page 0 selected by the system 
>> firmware
>> during boot.
>>
>
> Exactly. There is no guarantee that page 0 is selected.
>
>>> With that in mind, I have dropped the earlier patch that attempted
>>> to remove hardware access from the suspend path
>>> unconditionally.
>>> That patch does not address the root cause and is no longer
>>> part of the series.
>>>
>>> I am instead proposing a minimal 2-patch series:
>>>
>>> 1/2 records whether the platform enforces SPD write disable at probe
>>> time (no behavior change).
>>> 2/2 avoids regcache writeback during suspend/resume when the device
>>> operates in read-only mode, while still allowing read access to
>>> temperature inputs.
>>>
>>> This avoids issuing SMBus transactions that are architecturally
>>> blocked on these systems, and does not rely on
>>> delays or PM ordering assumptions, and leaves behavior unchanged on
>>> platforms where SPD writes are permitted.
>>>
>>> If this direction looks acceptable, I’m happy to re-spin and post the
>>> series formally.
>>>
>>> Thanks again for the guidance.
>>
>> I do not know if this is a reliable solution, as the system firmware 
>> might
>> select a different register page during resume. This will then 
>> prevent the
>> driver from functioning.
>>
>
> No, it is not reliable. The driver is simply not usable in this scenario.
> This isn't just the temperature sensor code - the eeprom code is affected
> as well.
>
Ok.

>> I would love to see the spd5118 driver working on such systems with 
>> reduced
>> functionality, but i will leave it to Guenter to decide if this 
>> approach is
>> maintainable.
>>
>> Besides that: did the spd5118 driver load automatically on your device?
>>
>
> I thought that was disabled. The i801 driver is supposed to detect if 
> write
> protect is enabled and, if so, it is supposed to not instantiate the 
> spd5118
> driver for DDR3. Support for this was added with commit 4d6d35d3417d 
> ("i2c:
> smbus: introduce Write Disable-aware SPD instantiating functions"). 
> Apparently
> the code to do this never made it into the i801 driver.
>
> The i801 driver needs to be fixed to inform the spd initialization code
> that the spd5118 address range is write protected. The patch to do 
> this was
> "i2c: i801: Do not instantiate spd5118 under SPD Write Disable". I 
> have no idea
> why that patch didn't make it upstream.
>
> Guenter
>
Good question, do you want to send the message to the i2c maintainers about this
or should i do it?

Thanks,
Armin Wolf


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ