lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21914890.EfDdHjke4D@pliszka>
Date:   Wed, 07 Jun 2023 08:01:14 +0200
From:   Sebastian Krzyszkowiak <sebastian.krzyszkowiak@...i.sm>
To:     Peng Fan <peng.fan@....com>,
        "Peng Fan (OSS)" <peng.fan@....nxp.com>,
        "rafael@...nel.org" <rafael@...nel.org>,
        "shawnguo@...nel.org" <shawnguo@...nel.org>,
        "s.hauer@...gutronix.de" <s.hauer@...gutronix.de>,
        Daniel Lezcano <daniel.lezcano@...aro.org>
Cc:     "amitk@...nel.org" <amitk@...nel.org>,
        "rui.zhang@...el.com" <rui.zhang@...el.com>,
        "andrew.smirnov@...il.com" <andrew.smirnov@...il.com>,
        "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kernel@...gutronix.de" <kernel@...gutronix.de>,
        "festevam@...il.com" <festevam@...il.com>,
        dl-linux-imx <linux-imx@....com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        Alice Guo <alice.guo@....com>
Subject: Re: [PATCH 2/3] thermal: qoriq_thermal: only enable supported sensors

On piÄ…tek, 2 czerwca 2023 15:11:37 CEST Daniel Lezcano wrote:
> On 01/06/2023 11:52, Peng Fan wrote:
> > Hi Daniel,
> > 
> >> Subject: RE: [PATCH 2/3] thermal: qoriq_thermal: only enable supported
> >> sensors
> >> 
> >>> Subject: Re: [PATCH 2/3] thermal: qoriq_thermal: only enable supported
> >>> sensors
> >>> 
> >>> On 31/05/2023 14:05, Peng Fan wrote:
> >>>>> Subject: Re: [PATCH 2/3] thermal: qoriq_thermal: only enable
> >>>>> supported sensors
> >>>>> 
> >>>>> On 16/05/2023 10:37, Peng Fan (OSS) wrote:
> >>>>>> From: Peng Fan <peng.fan@....com>
> >>>>>> 
> >>>>>> There are MAX 16 sensors, but not all of them supported. Such as
> >>>>>> i.MX8MQ, there are only 3 sensors. Enabling all 16 sensors will
> >>>>>> touch reserved bits from i.MX8MQ reference mannual, and TMU will
> >>>>>> stuck, temperature will not update anymore.
> >>>>>> 
> >>>>>> Fixes: 45038e03d633 ("thermal: qoriq: Enable all sensors before
> >>>>>> registering them")
> >>>>>> Signed-off-by: Peng Fan <peng.fan@....com>
> >>>>>> ---
> >>>>>> 
> >>>>>>     drivers/thermal/qoriq_thermal.c | 30
> >>>>>>     +++++++++++++++++++----------
> >> 
> >> -
> >> 
> >>>>>>     1 file changed, 19 insertions(+), 11 deletions(-)
> >>>>>> 
> >>>>>> diff --git a/drivers/thermal/qoriq_thermal.c
> >>>>>> b/drivers/thermal/qoriq_thermal.c index
> >> 
> >> b806a0929459..53748c4a5be1
> >> 
> >>>>>> 100644
> >>>>>> --- a/drivers/thermal/qoriq_thermal.c
> >>>>>> +++ b/drivers/thermal/qoriq_thermal.c
> >>>>>> @@ -31,7 +31,6 @@
> >>>>>> 
> >>>>>>     #define TMR_DISABLE	0x0
> >>>>>>     #define TMR_ME		0x80000000
> >>>>>>     #define TMR_ALPF	0x0c000000
> >>>>>> 
> >>>>>> -#define TMR_MSITE_ALL	GENMASK(15, 0)
> >>>>>> 
> >>>>>>     #define REGS_TMTMIR	0x008	/* Temperature measurement
> >>>>> 
> >>>>> interval Register */
> >>>>> 
> >>>>>>     #define TMTMIR_DEFAULT	0x0000000f
> >>>>>> 
> >>>>>> @@ -105,6 +104,11 @@ static int tmu_get_temp(struct
> >>>>> 
> >>>>> thermal_zone_device *tz, int *temp)
> >>>>> 
> >>>>>>     	 * within sensor range. TEMP is an 9 bit value representing
> >>>>>>     	 * temperature in KelVin.
> >>>>>>     	 */
> >>>>>> 
> >>>>>> +
> >>>>>> +	regmap_read(qdata->regmap, REGS_TMR, &val);
> >>>>>> +	if (!(val & TMR_ME))
> >>>>>> +		return -EAGAIN;
> >>>>> 
> >>>>> How is this change related to what is described in the changelog?
> >>>> 
> >>>> devm_thermal_zone_of_sensor_register will invoke get temp, since we
> >>>> reverted the 45038e03d633 did, we need to check TMR_ME to avoid
> >>> 
> >>> return
> >>> 
> >>>> invalid temperature.
> >>>> 
> >>>   From a higher perspective if the sensor won't be enabled, then the
> >>> 
> >>> thermal zone should not be registered, the get_temp won't happen on a
> >>> disabled sensor and this test won't be necessary, no ?
> > 
> > After thinking more, I'd prefer current logic.
> > 
> > We rely on devm_thermal_of_zone_register's return value to know
> > whether there is a valid zone, then set sites bit, and after collected
> > all site bits, we enable the thermal IP.
> > 
> > If move the enabling thermal IP before devm_thermal_of_zone_register,
> > We need check dtb thermal zone, to know which zone is valid for current
> > thermal IP. This would complicate the design.
> > 
> > So just checking the enabling bit in get temperature would be much
> > simpler, and there just a small window before enabling thermal IP.
> 
> If the thermal zone is not described, then the thermal zone won't be
> created as it fails with -ENODEV and thus get_temp won't be called on a
> disabled site, right?

That's not what the problem is. It's about zones that *will* be created - 
since the driver only knows that a thermal zone isn't described in the device 
tree after it fails registering, it can't enable the site *before* the zone 
gets registered - so it happens afterwards. That's why it needs this check to 
not return a bogus initial value before the site gets actually enabled later 
in qoriq_tmu_register_tmu_zone.

> Having test in the get_temp() ops is usually the sign there is something
> wrong with the driver initialization.

I've sent a patch that solved this exact problem back in 2021 by checking 
whether a zone is described first with thermal_zone_of_get_sensor_id, but it 
was sitting out there ignored long enough that the function got removed from 
the kernel in 5d10f480f77b and I'm not exactly sure how to solve it cleanly 
without getting it back. Without a replacement, what Peng did seems like the 
only way to not reintroduce the problem 45038e03d633 was supposed to fix.

The patch: https://lore.kernel.org/linux-pm/20220301033402.415445-1-sebastian.krzyszkowiak@puri.sm/

Questions on how to proceed further: https://lore.kernel.org/linux-pm/
5800115.iIbC2pHGDl@...szka/

Would be nice to get this finally resolved, the mainline TMU driver has been 
completely unreliable on i.MX8MQ for years now.

Thanks,
Sebastian


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ