[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a67ea323-634d-d34e-c63e-b1aaa4737b19@alliedtelesis.co.nz>
Date: Mon, 8 Mar 2021 04:37:47 +0000
From: Chris Packham <Chris.Packham@...iedtelesis.co.nz>
To: Guenter Roeck <linux@...ck-us.net>,
"jdelvare@...e.com" <jdelvare@...e.com>
CC: "linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-i2c@...r.kernel.org" <linux-i2c@...r.kernel.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>
Subject: Re: Errant readings on LM81 with T2080 SoC
On 8/03/21 3:27 pm, Chris Packham wrote:
>
> On 8/03/21 1:31 pm, Guenter Roeck wrote:
>> On 3/7/21 2:52 PM, Chris Packham wrote:
>>> Hi,
>>>
>>> I've got a system using a PowerPC T2080 SoC and among other things has
>>> an LM81 hwmon chip.
>>>
>>> Under a high CPU load we see errant readings from the LM81 as well as
>>> actual failures. It's the errant readings that cause the most concern
>>> since we can easily ignore the read errors in our monitoring
>>> application
>>> (although it would be better if they weren't there at all).
>>>
>>> I'm able to reproduce this with a test application[0] that artificially
>>> creates a high CPU load then by repeatedly checking for the all-1s
>>> values from the LM81 datasheet[1](page 17). The all-1s readings stick
>>> out as they are obviously higher than the voltage rails that are
>>> connected and disagree with measurements taken with a multimeter.
>>>
>>> Here's the output from my device
>>>
>>> [root@...uxbox ~]# cpuload 90&
>>> [root@...uxbox ~]# (while true; do cat
>>> /sys/class/hwmon/hwmon0/in*_input
>>> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
>>> 3586
>>> 3586
>>> cat: read error: No such device or address
>>> cat: read error: No such device or address
>>> 3320
>>> 3320
>>> 3586
>>> 3586
>>> 6641
>>> 6641
>>> 4383
>>> 4383
>>>
>>> Fundamentally I think this is a problem with the fact that the LM81 is
>>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c
>>> and we
>>> emulate SMBus. I suspect the errant readings are when we don't get
>>> round
>>> to completing the read within the timeout specified by the SMBus
>>> specification. Depending on when that happens we either fail the
>>> transfer or interpret the result as all-1s.
>>>
>> That is quite unlikely. Many sensor chips are SMBus chips connected to
>> i2c busses. It is much more likely that there is a bug in the T2080
>> i2c driver,
>> that the chip doesn't like the bulk read command issued through
>> regmap, that
>> the chip has problems with the i2c bus speed, or that the i2c bus is
>> noisy.
> Perhaps something gets upset when interrupt processing is delayed
> because of CPU load. I don't see the problem when there isn't a CPU
> load so I think that eliminates board issues.
>> In this context, the "No such device or address" responses are very
>> suspicious.
>> Those are reported by the i2c driver, not by the hwmon driver, and
>> suggest
>> that the chip did not respond to a read request. Maybe it helps to
>> enable
>> debugging to the i2c driver to see if it reports anything useful. Even
>> better might be to connect an i2c bus analyzer to the i2c bus and check
>> what is going on.
> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll
> enable some debug and see what we get.
For the errant readings there was nothing abnormal reported by the driver.
For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No
RXAK" which matches up with the -ENXIO return.
Powered by blists - more mailing lists