[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a7a7c60f-3415-4167-8a35-ed956eb7d3ce@salutedevices.com>
Date: Fri, 5 Apr 2024 23:06:17 +0300
From: George Stark <gnstark@...utedevices.com>
To: Harald Geyer <harald@...ib.org>, <lars@...afoo.de>, <jic23@...nel.org>
CC: <linux-iio@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<kernel@...utedevices.com>
Subject: Re: [PATCH] iio: dht11: set debug log level for parsing error
messages
Hello Harald
On 3/26/24 17:03, Harald Geyer wrote:
> Hi George!
>
> Am Montag, dem 25.03.2024 um 23:18 +0300 schrieb George Stark:
>> On 3/25/24 21:48, Harald Geyer wrote:
>>> Am Montag, dem 25.03.2024 um 19:54 +0300 schrieb George Stark:
>>>> Protocol parsing errors could happen due to several reasons like
>>>> noise
>>>> environment, heavy load on system etc. If to poll the sensor
>>>> frequently
>>>> and/or for a long period kernel log will become polluted with
>>>> error
>>>> messages if their log level is err (i.e. on by default).
>>>
>>> Yes, these error are often recoverable. (As are many other HW
>>> errors,
>>> that typically are logged. Eg USB bus resets due to EMI)
>>>
>>> [...]
>>>
>>> The idea is, that these messages help users understand issues with
>>> their HW (like too long cables, broken cables etc). But it is true,
>>> that they will slowly accumulate in many real world scenarios
>>> without
>>> anything being truly wrong.
>> I agree with you that it's very convenient to just take a look to
>> dmesg
>> and see device connection problems at once. But unlike e.g. usb user
>> has
>> to actually start reading sensor to perform communication and read
>> errors will be propagated to the userspace and could be noticed \
>> handled.
>
> Not really. The log lines contain additional information useful for
> understanding the problem with the setup.
>
>> Anyway I believe we should use uniform approach for read errors -
>> currently in the driver there're already dbg messages:
>>
>> "lost synchronisation at edge %d\n"
>> "invalid checksum\n"
>
> These errors are usually caused by EMI and there isn't much to do aside
> from trying again until we find a time window with less interference.
> They are not logged, because in some cases they might be very frequent
> and can be handled by the user space client programatically anyway.
>
>> I changed log level from err to dbg for the messages:
>>
>> "Only %d signal edges detected\n"
>
> This mostly indicates a problem with the setup. Long cable, dead
> sensor, high (interrupt) load etc.
>
> Its true that this can happen during normal operation. - Usually when
> the system takes too long to enter the irq handler.
Usually there's nothing to do about it. And several late-handled
interrupts can lead to any of 4 errors.
>
> But the primary causes are:
> 1) Your wiring is broken. In this case, the message is immediately
> helpful and points you in the right direction. (Only if you understand
> the protocol though.)
> 2) Your sensor is dead or "crashed", which also warrants an error msg
> IMO.
My sensor works around 3 years 24\7 (with one reboot per month) and
still precise (for home usage at least) - I compared it with several
other home temperature sensors.
I've just checked another sensor in my setup (3.3v, cable 15cm) the
error rate is the same. So it's ether EMI or single-core slow cpu.
>
> The "crashed" case is a bit special. Some chips seem to randomly stop
> working after a couple of hours and the only remedy is to power cycle
> them. This could be done automatically. - I have the sensor power
> supply pin on a GPIO and reset it from userspace in my setup. I tried
> to work on a version of the driver some years ago, that would
> optionally register with a regulator and manage sensor resets from
> within the kernel driver. If this was actually implemented, we could
> reduce the logging to cases, where the reset didn't solve the problem.
>
> I stopped working on this, because it would have required changes to
> the regulator framework, to be actually useful, and the regulator
> maintainers didn't seem to keen about them. However, if you want to
> pick this up in an effort to reduce unnecessary error conditions and
> messages, I certainly would be happy.
>
>> "Don't know how to decode data: %d %d %d %d\n"
>
> This would indicate a sensor, that uses the same protocol but an
> unsupported data format. This is a permanent error and therefor should
> be logged IMO.
>
> I guess, if you have a bad readout due to EMI but the checksum
> accidentally matches, then you might get this message too. But this
> should be a very rare case.
It's not very rare but sure not permanent. It can happen once per
several hundreds / a thousand requests.
>
>> They all are from a single callback and say the same thing -
>> communication problem.
>
> Not really. See above.
>
>> If we make all those messages as errors it'd be great to have
>> mechanism
>> to disable them e.g. thru module parameter or somehow without
>> rebuilding
>> kernel.
>
> No. What you try to change is cosmetic at best. It certainly doesn't
> justify adding any complexity.
>
> Since Jonathan deferred to my judgment:
>
> As you can see, I did consider the trade-off between useful diagnostics
> and spamming the log carefully. So naturally I'm inclined to reject
> your proposal unless it solves an actual problem.
>
> Also people still mail me directly with bogus bug reports about the
> driver when really they have some issue with their setup. I fear, if we
> reduce diagnostics, it will increase that noise.
If you don't ask those people to send dmesg with CONFIG_DEBUG is on you
lost 2 of 4 possible errors and don't see the whole picture. Other
dev_dbg messages in this driver are rather helpful.
Anyway thanks for that very interesting information. It's not obvious at
all if you're unfamiliar with protocol and the driver. Probably it
should be put in some documentation. Without diving into driver's source
code those messages are just noise.
Still I consider all those cases are purely debug. When there no
hardware support for sensor's protocol and the kernel is not realtime
then errors are probable. But often a temp sensor application is
not required to struggle for every measurement. And it's not desirable
to add logrotate to embedded system just because of temp sensor diagnostics.
>
> So I reject your proposed changes, if they are for the sake of
> unification. I'm willing to discuss, what the most sensible trade-off
> is, but it would need to actually add to the considerations I already
> did.
>
> Best regards,
> Harald
>
>
>> Those errors can be bypassed by increasing read rate.
>>
>>>
>>> I don't consider the dmesg buffer being rotated after a month or
>>> two a
>>> bug. But I suppose this is a corner case. I'll happily accept
>>> whatever
>>> Jonathan thinks is reasonable.
>>>
>>> Best regards,
>>> Harald
>>>
>>>
>>>> Signed-off-by: George Stark <gnstark@...utedevices.com>
>>>> ---
>>>> I use DHT22 sensor with Raspberry Pi Zero W as a simple home
>>>> meteo
>>>> station.
>>>> Even if to poll the sensor once per tens of seconds after month
>>>> or
>>>> two dmesg
>>>> may become full of useless parsing error messages. Anyway those
>>>> errors are caught
>>>> in the user software thru return values.
>>>>
>>>> drivers/iio/humidity/dht11.c | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/iio/humidity/dht11.c
>>>> b/drivers/iio/humidity/dht11.c
>>>> index c97e25448772..e2cbc442177b 100644
>>>> --- a/drivers/iio/humidity/dht11.c
>>>> +++ b/drivers/iio/humidity/dht11.c
>>>> @@ -156,7 +156,7 @@ static int dht11_decode(struct dht11 *dht11,
>>>> int
>>>> offset)
>>>> dht11->temperature = temp_int * 1000;
>>>> dht11->humidity = hum_int * 1000;
>>>> } else {
>>>> - dev_err(dht11->dev,
>>>> + dev_dbg(dht11->dev,
>>>> "Don't know how to decode data: %d %d %d
>>>> %d\n",
>>>> hum_int, hum_dec, temp_int, temp_dec);
>>>> return -EIO;
>>>> @@ -239,7 +239,7 @@ static int dht11_read_raw(struct iio_dev
>>>> *iio_dev,
>>>> #endif
>>>>
>>>> if (ret == 0 && dht11->num_edges <
>>>> DHT11_EDGES_PER_READ - 1) {
>>>> - dev_err(dht11->dev, "Only %d signal edges
>>>> detected\n",
>>>> + dev_dbg(dht11->dev, "Only %d signal edges
>>>> detected\n",
>>>> dht11->num_edges);
>>>> ret = -ETIMEDOUT;
>>>> }
>>>> --
>>>> 2.25.1
>>>>
>>>
>>
>
--
Best regards
George
Powered by blists - more mailing lists