lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <96d660bc-17ab-4e0e-9a94-bce1737a8da1@roeck-us.net>
Date:   Sun, 7 Mar 2021 16:31:02 -0800
From:   Guenter Roeck <linux@...ck-us.net>
To:     Chris Packham <Chris.Packham@...iedtelesis.co.nz>,
        "jdelvare@...e.com" <jdelvare@...e.com>
Cc:     "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-i2c@...r.kernel.org" <linux-i2c@...r.kernel.org>,
        "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>
Subject: Re: Errant readings on LM81 with T2080 SoC

On 3/7/21 2:52 PM, Chris Packham wrote:
> Hi,
> 
> I've got a system using a PowerPC T2080 SoC and among other things has 
> an LM81 hwmon chip.
> 
> Under a high CPU load we see errant readings from the LM81 as well as 
> actual failures. It's the errant readings that cause the most concern 
> since we can easily ignore the read errors in our monitoring application 
> (although it would be better if they weren't there at all).
> 
> I'm able to reproduce this with a test application[0] that artificially 
> creates a high CPU load then by repeatedly checking for the all-1s 
> values from the LM81 datasheet[1](page 17). The all-1s readings stick 
> out as they are obviously higher than the voltage rails that are 
> connected and disagree with measurements taken with a multimeter.
> 
> Here's the output from my device
> 
> [root@...uxbox ~]# cpuload 90&
> [root@...uxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input 
> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
> 3586
> 3586
> cat: read error: No such device or address
> cat: read error: No such device or address
> 3320
> 3320
> 3586
> 3586
> 6641
> 6641
> 4383
> 4383
> 
> Fundamentally I think this is a problem with the fact that the LM81 is 
> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we 
> emulate SMBus. I suspect the errant readings are when we don't get round 
> to completing the read within the timeout specified by the SMBus 
> specification. Depending on when that happens we either fail the 
> transfer or interpret the result as all-1s.
> 

That is quite unlikely. Many sensor chips are SMBus chips connected to
i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
that the chip doesn't like the bulk read command issued through regmap, that
the chip has problems with the i2c bus speed, or that the i2c bus is noisy.

In this context, the "No such device or address" responses are very suspicious.
Those are reported by the i2c driver, not by the hwmon driver, and suggest
that the chip did not respond to a read request. Maybe it helps to enable
debugging to the i2c driver to see if it reports anything useful. Even
better might be to connect an i2c bus analyzer to the i2c bus and check
what is going on.

Guenter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ