linux-kernel - Re: Regression: w1_therm: sysfs w1_slave sometimes report 85 degrees Celsius

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2023042645-attentive-amends-7b0b@gregkh>
Date:   Wed, 26 Apr 2023 16:01:07 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     Stefan Wahren <stefan.wahren@...e.com>
Cc:     Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>,
        Akira Shimahara <akira215corp@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Stefan Wahren <stefan.wahren@...rgebyte.com>,
        regressions@...ts.linux.dev
Subject: Re: Regression: w1_therm: sysfs w1_slave sometimes report 85 degrees
 Celsius

On Wed, Apr 26, 2023 at 03:39:15PM +0200, Stefan Wahren wrote:
> Hi,
> 
> recently we switch on our Tarragon board (i.MX6ULL) to Linux 6.1 and noticed
> that the connected 1-wire temperature sensors (w1_therm.w1_strong_pull=0)
> sometimes (~ 1 of 20 times) report 85 degrees Celsius, which is AFAIK the
> only way to report errors to the 1-wire master:
> 
> sys/bus/w1/devices/28-04168158faff# cat w1_slave
> 50 05 4b 46 7f ff 0c 10 1c : crc=1c YES
> 50 05 4b 46 7f ff 0c 10 1c t=85000
> 
> I wasn't able to reproduce this issue with the old kernel 4.9.
> 
> After that i successfully bisected the issue to this commit:
> 67b392f7b8ed ("w1_therm: optimizing temperature read timings")
> 
> Unfortunately this commit contains a lot of independent changes, which makes
> it hard to figured out the cause of this issue. So i tried to split this
> patch in seven independent changes [1]. Now i was able to bisect the cause
> further to this change [2] which seems to rework the pullup handling within
> read_therm().
> 
> Looking closer at the code change and verify it some debug messages, the
> change inverted the locking behavior (before: no pullup -> keep lock, after:
> no pullup -> release lock during sleep).
> 
> Before:
> 	if (external_power) {
> 		mutex_unlock(&dev_master->bus_mutex);
> 
> 		sleep_rem = msleep_interruptible(tm);
> 		if (sleep_rem != 0) {
> 			ret = -EINTR;
> 			goto dec_refcnt;
> 		}
> 
> 		ret = mutex_lock_interruptible(&dev_master->bus_mutex);
> 		if (ret != 0)
> 			goto dec_refcnt;
> 	} else if (!w1_strong_pullup) {
> 		sleep_rem = msleep_interruptible(tm);
> 		if (sleep_rem != 0) {
> 			ret = -EINTR;
> 			goto mt_unlock;
> 		}
> 	}
> 
> After:
> 	if (strong_pullup) { /*some device need pullup */
> 		sleep_rem = msleep_interruptible(tm);
> 		if (sleep_rem != 0) {
> 			ret = -EINTR;
> 			goto mt_unlock;
> 		}
> 	} else { /*no device need pullup */
> 		mutex_unlock(&dev_master->bus_mutex);
> 
> 		sleep_rem = msleep_interruptible(tm);
> 		if (sleep_rem != 0) {
> 			ret = -EINTR;
> 			goto dec_refcnt;
> 		}
> 
> 		ret = mutex_lock_interruptible(&dev_master->bus_mutex);
> 		if (ret != 0)
> 			goto dec_refcnt;
> 	}
> 
> I don't believe this is intended. After inverting the strong_pullup check,
> the issue wasn't reproducible on our platform anymore. But i'm not sure this
> is clean.
> 
> Best regards
> 
> #regzbot introduced: 67b392f7b8ed
> 
> [1] - https://github.com/chargebyte/linux/commits/v6.1-tarragon_w1
> [2] - https://github.com/chargebyte/linux/commit/17ca863a32a6a1bdd376959f05c954bef12fc1b5

Can you send a patch that shows the change you wish to have made here so
that you can get credit for fixing the issue?

thanks,

greg k-h