[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGXv+5HpRvsYz3ZJ_hNM=RhDHbnEZA73Xwm1YgjuWWPXhymT9w@mail.gmail.com>
Date: Thu, 20 Jun 2024 17:46:26 +0800
From: Chen-Yu Tsai <wenst@...omium.org>
To: Julien Panis <jpanis@...libre.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
Cc: Rob Herring <robh@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>,
Conor Dooley <conor+dt@...nel.org>, Matthias Brugger <matthias.bgg@...il.com>,
Daniel Lezcano <daniel.lezcano@...aro.org>, Nicolas Pitre <npitre@...libre.com>,
"Rafael J. Wysocki" <rafael@...nel.org>, Zhang Rui <rui.zhang@...el.com>,
Lukasz Luba <lukasz.luba@....com>, devicetree@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-mediatek@...ts.infradead.org, linux-pm@...r.kernel.org,
Krzysztof Kozlowski <krzk@...nel.org>
Subject: Re: [PATCH v6 4/6] arm64: dts: mediatek: mt8186: add default thermal zones
Hi,
On Mon, Jun 3, 2024 at 3:58 PM Julien Panis <jpanis@...libre.com> wrote:
>
> On 5/29/24 14:06, AngeloGioacchino Del Regno wrote:
> > Il 29/05/24 11:12, Julien Panis ha scritto:
> >> On 5/29/24 10:33, Chen-Yu Tsai wrote:
> >>> On Wed, May 29, 2024 at 4:17 PM AngeloGioacchino Del Regno
> >>> <angelogioacchino.delregno@...labora.com> wrote:
> >>>> Il 29/05/24 07:57, Julien Panis ha scritto:
> >>>>> From: Nicolas Pitre <npitre@...libre.com>
> >>>>>
> >>>>> Inspired by the vendor kernel but adapted to the upstream thermal
> >>>>> driver version.
> >>>>>
> >>>>> Signed-off-by: Nicolas Pitre <npitre@...libre.com>
> >>>>> Signed-off-by: Julien Panis <jpanis@...libre.com>
> >>>> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
> >>> I'm getting some crazy readings which would cause the machine to
> >>> immediately shutdown during boot. Anyone else see this? Or maybe
> >>> my device has bad calibration data?
> >>>
> >>> gpu_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1: +229.7 C
> >>>
> >>> nna_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1: +229.7 C
> >>>
> >>> cpu_big0_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1: -7.2 C
> >>>
> >>> cpu_little2_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1: +157.2 C
> >>>
> >>> cpu_little0_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1: -277.1 C
> >>>
> >>> adsp_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1: +229.7 C
> >>>
> >>> cpu_big1_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1: +229.7 C
> >>>
> >>> cam_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1: +45.4 C
> >>>
> >>> cpu_little1_thermal-virtual-0
> >>> Adapter: Virtual device
> >>> temp1: -241.8 C
> >>
> >> It's likely that your device has bad calibration data indeed. We observed the same
> >> behavior on the mt8186 device we used (a Corsola) and finally realized that the
> >> golden temperature was 0 (device not properly calibrated).
> >>
> >> To make a comparison, we run chromiumos v5.15 and dmesg output was:
> >> 'This sample is not calibrated, fake !!'
> >> Additional debugging revealed that the golden temp was actually 0. As a result,
> >> chromiumos v5.15 does not use the calibration data. It uses some default values
> >> instead. That's why you can observe good temperatures with chromiumos v5.15
> >> even with a device that is not calibrated.
> >>
> >> This feature is not implemented in the driver upstream, so you need a device
> >> properly calibrated to get good temperatures with it. When we forced this
> >> driver using the default values used by chromiumos v5.15 instead of real calib
> >> data (temporarily, just for testing), the temperatures were good.
> >>
> >> Please make sure your device is properly calibrated: 0 < golden temp < 62.
> >>
> >
> > Wait wait wait wait.
> >
> > What's up with that calibration data stuff?
> >
> > If there's any device that cannot use the calibration data, we need a way to
> > recognize whether the provided data (read from efuse, of course) is valid,
> > otherwise we're creating an important regression here.
> >
> > "This device is unlucky" is not a good reason to have this kind of regression.
> >
> > Since - as far as I understand - downstream can recognize that, upstream should
> > do the same.
> > I'd be okay with refusing to even probe this driver on such devices for the
> > moment being, as those are things that could be eventually handled on a second
> > part series, even though I would prefer a kind of on-the-fly calibration or
> > anyway something that would still make the unlucky ones to actually have good
> > readings *right now*.
> >
> > Though, the fact that you assert that you observed this behavior on one of your
> > devices and *still decided to send that upstream* is, in my opinion, unacceptable.
> >
> > Regards,
> > Angelo
>
> I've been trying to find some more information about the criteria
> "device calibrated VS device not calibrated" because there's a
> confusing comment in downstream code (the comment does not
> match what I observe on my device). I'll send a separate patch
> to add this feature over the next few days, when I get additional
> information from MTK about this criteria.
I couldn't wait and sent a patch to provide default calibration data,
based on the values and code from the ChromeOS kernels. It seems to
work OK-ish. I get 4x degrees C on my MT8186 device.
Also, your previous patch blocking invalid efuse data has landed. So
I think this series can be relanded. What do you think, Angelo?
ChenYu
Powered by blists - more mailing lists