[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <96d660bc-17ab-4e0e-9a94-bce1737a8da1@roeck-us.net>
Date: Sun, 7 Mar 2021 16:31:02 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Chris Packham <Chris.Packham@...iedtelesis.co.nz>,
"jdelvare@...e.com" <jdelvare@...e.com>
Cc: "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-i2c@...r.kernel.org" <linux-i2c@...r.kernel.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>
Subject: Re: Errant readings on LM81 with T2080 SoC
On 3/7/21 2:52 PM, Chris Packham wrote:
> Hi,
>
> I've got a system using a PowerPC T2080 SoC and among other things has
> an LM81 hwmon chip.
>
> Under a high CPU load we see errant readings from the LM81 as well as
> actual failures. It's the errant readings that cause the most concern
> since we can easily ignore the read errors in our monitoring application
> (although it would be better if they weren't there at all).
>
> I'm able to reproduce this with a test application[0] that artificially
> creates a high CPU load then by repeatedly checking for the all-1s
> values from the LM81 datasheet[1](page 17). The all-1s readings stick
> out as they are obviously higher than the voltage rails that are
> connected and disagree with measurements taken with a multimeter.
>
> Here's the output from my device
>
> [root@...uxbox ~]# cpuload 90&
> [root@...uxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input
> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
> 3586
> 3586
> cat: read error: No such device or address
> cat: read error: No such device or address
> 3320
> 3320
> 3586
> 3586
> 6641
> 6641
> 4383
> 4383
>
> Fundamentally I think this is a problem with the fact that the LM81 is
> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we
> emulate SMBus. I suspect the errant readings are when we don't get round
> to completing the read within the timeout specified by the SMBus
> specification. Depending on when that happens we either fail the
> transfer or interpret the result as all-1s.
>
That is quite unlikely. Many sensor chips are SMBus chips connected to
i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
that the chip doesn't like the bulk read command issued through regmap, that
the chip has problems with the i2c bus speed, or that the i2c bus is noisy.
In this context, the "No such device or address" responses are very suspicious.
Those are reported by the i2c driver, not by the hwmon driver, and suggest
that the chip did not respond to a read request. Maybe it helps to enable
debugging to the i2c driver to see if it reports anything useful. Even
better might be to connect an i2c bus analyzer to the i2c bus and check
what is going on.
Guenter
Powered by blists - more mailing lists