lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ace6290-8ef8-4158-b48d-fcd841910a23@roeck-us.net>
Date: Mon, 12 Jan 2026 10:07:43 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Armin Wolf <W_Armin@....de>, TINSAE TADESSE <tinsaetadesse2015@...il.com>
Cc: linux-hwmon@...r.kernel.org, linux-kernel@...r.kernel.org,
 bhelgaas@...gle.com, linux-pci@...r.kernel.org
Subject: Re: [PATCH 1/3] hwmon: spd5118: Do not fail resume on temporary I2C
 errors

On 1/12/26 09:41, Armin Wolf wrote:
> Am 12.01.26 um 12:48 schrieb TINSAE TADESSE:
> 
>> On Sun, Jan 11, 2026 at 1:27 AM Armin Wolf <W_Armin@....de> wrote:
>>> Am 10.01.26 um 18:19 schrieb Tinsae Tadesse:
>>>
>>>> SPD5118 DDR5 temperature sensors may be temporarily unavailable
>>>> during s2idle resume. Ignore temporary -ENXIO and -EIO errors during resume and allow
>>>> register synchronization to be retried later.
>>> Hi,
>>>
>>> do you know if the error is caused by the SPD5118 device itself or by the underlying
>>> i2c controller? Please also share the output of "acpidump" and the name of the i2c
>>> controller used to communicate with the SPD5118.
>>>
>>>> Signed-off-by: Tinsae Tadesse <tinsaetadesse2015@...il.com>
>>>> ---
>>>>    drivers/hwmon/spd5118.c | 8 +++++++-
>>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/hwmon/spd5118.c b/drivers/hwmon/spd5118.c
>>>> index 5da44571b6a0..ec9f14f6e0df 100644
>>>> --- a/drivers/hwmon/spd5118.c
>>>> +++ b/drivers/hwmon/spd5118.c
>>>> @@ -512,9 +512,15 @@ static int spd5118_resume(struct device *dev)
>>>>    {
>>>>        struct spd5118_data *data = dev_get_drvdata(dev);
>>>>        struct regmap *regmap = data->regmap;
>>>> +     int ret;
>>>>
>>>>        regcache_cache_only(regmap, false);
>>>> -     return regcache_sync(regmap);
>>>> +     ret = regcache_sync(regmap);
>>>> +     if(ret == -ENXIO || ret == -EIO) {
>>>> +             dev_warn(dev, "SPD hub not responding on resume (%d), deferring init\n", ret);
>>>> +             return 0;
>>>> +     }
>>> The specification says that the SPD5118 might take up to 10ms to initialize its i2c interface
>>> after power on. Can you test if simply waiting for 10ms before syncing the regcache solves this
>>> issue?
>>>
>>> Thanks,
>>> Armin Wolf
>>>
>>>> +     return ret;
>>>>    }
>>>>
>>>>    static DEFINE_SIMPLE_DEV_PM_OPS(spd5118_pm_ops, spd5118_suspend, spd5118_resume);
>> Hi Armin,
>>
>>> Do you know if the error is caused by the SPD5118 device itself or by the underlying i2c controller?
>> The error appears to be caused by the underlying I2C controller and platform
>> power sequencing rather than by the SPD5118 device itself.
>>
>> The failure manifests as a temporary -ENXIO occurring only during s2idle
>> resume. The SPD5118 temperature sensor works correctly before suspend and
>> after resume once the bus becomes available again. This indicates that the
>> driver’s resume callback may be invoked before the I2C controller or firmware
>> has fully re-enabled access to the SPD hub.
>>
>>> Please also share the output of "acpidump" and the name of the i2c
>> controller used to communicate with the SPD5118.
>>
>> I have attached the output of acpidump as requested.
>> The SPD5118 is connected via I2C bus 14 and accessed through the Intel
>> I801 SMBus controller (0000:00:1f.4), which is ACPI-managed.
> 
> Interesting, the ACPI code seems to do two things when the i2c controller suspends (aka is put into D3):
> 
> 1. A unknown register 0x84 ("PMEC") is modified
> 2. The PCI BAR of the i2c controller is disabled
> 
> Since the PCI bar is not re-enabled during resume, i suspect that either the firmware
> is buggy or that the firmware relies on the operating system to restore any BAR settings
> when resuming.
> 
> I do not know how the PCI core handles suspend, so i CCed the associated maintainers.
> 
>>> Can you test if simply waiting for 10ms before syncing the regcache solves this
>> issue?
>>
>> I tested adding an explicit msleep(10) in spd5118_resume() before calling
>> regcache_sync(), for the I2C interface to become ready after power-on.
>> With this delay in place, the resume failures (-ENXIO during regcache sync)
>> no longer occur, and repeated suspend/resume cycles are completed successfully.
>>
>> However, relying on a fixed delay in the resume path is not robust and would
>> not be suitable across platforms with different firmware and power sequencing.
>> It also still performs hardware I/O during PM resume.
> 
> In this case the 10 ms delay is OK since the specification of the SPD5118 device explicitly
> states that the device needs those 10ms to become operational after loosing power.
> 

I would agree, but it seems to be quite unlikely that this is the actual problem
since that would mean that the SPD1118 is actually accessed less than 10ms after
it was powered on. That is unlikely even if the power rail to the SPD chip is separate
from the power rail to the DDR itself. On top of that, pretty much every chip has
similar restrictions. I really don't want to have to add similar code to every driver
because a power sequencer somewhere has a problem and doesn't add the required delay
after enabling power to a chip (or disables power to a chip before it is properly
suspended). We have to make _some_ assumptions in the kernel, and one of those
assumptions is that the chip is still accessible on suspend, and that it is accessible
on resume.

Worse - if the problem is the i2c controller, there is no guarantee that the 10ms
delay is sufficient. It may be sufficient in this case, but if the resume order changes
and the i2c controller is re-enabled even later than it is today, we'd end up in the
same situation. Either case, if the i2c controller is indeed the problem, trying to
work around that problem in the driver of a device attached to it is just wrong.

Guenter


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ