lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ12PfNg_q5m8tLo_ofVgE7fTr4uVDzVuLSTe202A_8ygRgQAw@mail.gmail.com>
Date: Tue, 13 Jan 2026 22:33:53 +0300
From: TINSAE TADESSE <tinsaetadesse2015@...il.com>
To: Guenter Roeck <linux@...ck-us.net>
Cc: Armin Wolf <W_Armin@....de>, linux-hwmon@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] hwmon: spd5118: Do not fail resume on temporary I2C errors

On Mon, Jan 12, 2026 at 10:11 PM Guenter Roeck <linux@...ck-us.net> wrote:
>
> On 1/12/26 10:22, Armin Wolf wrote:
> > Am 12.01.26 um 19:17 schrieb Guenter Roeck:
> >
> >> On 1/12/26 09:46, Armin Wolf wrote:
> >>> Am 12.01.26 um 17:36 schrieb Guenter Roeck:
> >>>
> >>>> On 1/10/26 14:27, Armin Wolf wrote:
> >>>>> Am 10.01.26 um 18:19 schrieb Tinsae Tadesse:
> >>>>>
> >>>>>> SPD5118 DDR5 temperature sensors may be temporarily unavailable
> >>>>>> during s2idle resume. Ignore temporary -ENXIO and -EIO errors during resume and allow
> >>>>>> register synchronization to be retried later.
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> do you know if the error is caused by the SPD5118 device itself or by the underlying
> >>>>> i2c controller? Please also share the output of "acpidump" and the name of the i2c
> >>>>> controller used to communicate with the SPD5118.
> >>>>>
> >>>>>> Signed-off-by: Tinsae Tadesse <tinsaetadesse2015@...il.com>
> >>>>>> ---
> >>>>>>   drivers/hwmon/spd5118.c | 8 +++++++-
> >>>>>>   1 file changed, 7 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/hwmon/spd5118.c b/drivers/hwmon/spd5118.c
> >>>>>> index 5da44571b6a0..ec9f14f6e0df 100644
> >>>>>> --- a/drivers/hwmon/spd5118.c
> >>>>>> +++ b/drivers/hwmon/spd5118.c
> >>>>>> @@ -512,9 +512,15 @@ static int spd5118_resume(struct device *dev)
> >>>>>>   {
> >>>>>>       struct spd5118_data *data = dev_get_drvdata(dev);
> >>>>>>       struct regmap *regmap = data->regmap;
> >>>>>> +    int ret;
> >>>>>>       regcache_cache_only(regmap, false);
> >>>>>> -    return regcache_sync(regmap);
> >>>>>> +    ret = regcache_sync(regmap);
> >>>>>> +    if(ret == -ENXIO || ret == -EIO) {
> >>>>>> +        dev_warn(dev, "SPD hub not responding on resume (%d), deferring init\n", ret);
> >>>>>> +        return 0;
> >>>>>> +    }
> >>>>>
> >>>>> The specification says that the SPD5118 might take up to 10ms to initialize its i2c interface
> >>>>> after power on. Can you test if simply waiting for 10ms before syncing the regcache solves this
> >>>>> issue?
> >>>>>
> >>>>
> >>>> It seems to be highly unlikely that this code executes within 10ms of powering on the memory.
> >>>>
> >>>> Guenter
> >>>>
> >>> AFAIK the 10ms are associated with the VDDIO supply, the VDDSPD main supply is different from that.
> >>> I just want to test if this device disables VDDIO during suspend-to-idle.
> >>>
> >>> I have another theory: if the SPD5118 somehow looses power, we might still need to manually put the
> >>> device into 16-bit address mode using standard 8-bit i2c commands.
> >>>
> >>
> >> Uh, no, we can not do that. I tried. Changing the access mode causes bad hiccups at least
> >> with some boards. They interpret that as a memory configuration change, and the next warm
> >> reboot will end up in the BIOS. Then, after the RAM configuration is updated, a cold reboot
> >> will again detect a configuration change and BIOS will be entered again.
> >>
> >> That does make me wonder how the problem shows up in the first place, since the BIOS
> >> usually does access the SPD5118 during resume, at least on my systems with DDR5. Granted,
> >> those are with AMD CPUs, but I would assume that Intel BIOS versions are not different.
> >>
> >> Guenter
> >>
> > During suspend-to-idle the RAM stays active, so the firmware does not really need to access the SPD device.
> > I meant that if the SPD device is configured during boot to operate in 16-bit mode and looses power during
> > suspend-to-idle, the firmware might not reconfigure the SPD to continue operate in 16-bit mode after resume.
> >
>
> This would be a severe BIOS/Firmware problem. I'd really want to see evidence that
> restoring SPD5118_REG_I2C_LEGACY_MODE is necessary on resume before touching that
> register. Even then I'd want to see evidence that touching it doesn't cause
> problems on a variety of boards before actually doing it. After all, we would not
> know if the register was reconfigured by the firmware for some reason, or if the
> chip lost power and the firmware didn't handle it.
>
> Having said that, even if there is evidence that the chip can lose power and end up
> in the wrong mode on resume, I'd rather check for that condition, issue a WARN_ONCE(),
> and disable the driver instead of touching its configuration.
>
> Sorry, but from my experience SPD5118 access in general is quite fragile, and not
> touching its configuration was the only way I could find to make it work reliably.
>
> Guenter
>

I experimented by configuring the driver to observe
SPD5118_REG_I2C_LEGACY_MODE and SPD5118_REG_TEMP_CONFIG registers
before suspend and immediately after resume, prior to any
driver-initiated writes.
Across multiple suspend-to-idle cycles, register values remained unchanged,
indicating that the SPD5118 does not lose configuration and
does not require reprogramming of legacy/16-bit mode on resume.
Resume failures correlate with temporary I2C unavailability rather
than loss of device state.

Since the device configuration remains intact across resume and the
failure correlates with I2C availability,
restoring SPD5118_REG_I2C_LEGACY_MODE is neither required nor justified.
For this reason, I agree that restoring this register would be unsafe
without broader platform testing.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ