lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <trinity-fc0082c7-cf37-4458-ac72-3c5774702f5d-1694625368681@3c-app-gmx-bs38>
Date:   Wed, 13 Sep 2023 19:16:08 +0200
From:   Frank Wunderlich <frank-w@...lic-files.de>
To:     AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>
Cc:     Frank Wunderlich <linux@...web.de>,
        linux-mediatek@...ts.infradead.org,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Amit Kucheria <amitk@...nel.org>,
        Zhang Rui <rui.zhang@...el.com>,
        Rob Herring <robh+dt@...nel.org>,
        Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
        Conor Dooley <conor+dt@...nel.org>,
        Matthias Brugger <matthias.bgg@...il.com>,
        linux-pm@...r.kernel.org, devicetree@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        Daniel Golle <daniel@...rotopia.org>
Subject: Aw: Re: [RFC v1 3/3] thermal/drivers/mediatek/lvts_thermal: add
 mt7988 support

Hi,
> Gesendet: Mittwoch, 13. September 2023 um 13:43 Uhr
> Von: "AngeloGioacchino Del Regno" <angelogioacchino.delregno@...labora.com>
> An: frank-w@...lic-files.de, "Frank Wunderlich" <linux@...web.de>, linux-mediatek@...ts.infradead.org
> Il 13/09/23 12:52, Frank Wunderlich ha scritto:
> > Am 13. September 2023 10:16:51 MESZ schrieb AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>:
> > Hi angelo,
> > 
> > thanks for first look
> > 
> >> Il 11/09/23 20:33, Frank Wunderlich ha scritto:
> >>> From: Frank Wunderlich <frank-w@...lic-files.de>
> >>>
> >>> Add Support for mediatek fologic 880/MT7988.
> >>>
> >>> Signed-off-by: Frank Wunderlich <frank-w@...lic-files.de>
> >>> ---
> >>>    drivers/thermal/mediatek/lvts_thermal.c | 73 +++++++++++++++++++++++++
> >>>    1 file changed, 73 insertions(+)
> >>>
> >>> diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c
> >>> index c1004b4da3b6..48b257a3c80e 100644
> >>> --- a/drivers/thermal/mediatek/lvts_thermal.c
> >>> +++ b/drivers/thermal/mediatek/lvts_thermal.c
> >>> @@ -82,6 +82,8 @@
> >>>    #define LVTS_GOLDEN_TEMP_DEFAULT	50
> >>>    #define LVTS_COEFF_A_MT8195			-250460
> >>>    #define LVTS_COEFF_B_MT8195			250460
> >>> +#define LVTS_COEFF_A_MT7988			-204650
> >>> +#define LVTS_COEFF_B_MT7988			204650
> >>>      #define LVTS_MSR_IMMEDIATE_MODE		0
> >>>    #define LVTS_MSR_FILTERED_MODE		1
> >>> @@ -1272,6 +1274,67 @@ static int lvts_remove(struct platform_device *pdev)
> >>>    	return 0;
> >>>    }
> >>>    +/*
> >>> + * LVTS MT7988
> >>> + */
> >>> +#define LVTS_HW_SHUTDOWN_MT7988	117000
> >>
> >> Are you sure that this chip's Tj is >117°C ?!
> >>
> >> Looks a bit high... if it is exactly 117°C, I would suggest cutting earlier,
> >> either at 110 (safe side) or 115: after all, this is a life-saver feature and
> >> the chip is actually never meant to *constantly* work at 110°C (as it would
> >> degrade fast and say goodbye earlier than "planned").
> > 
> > I took values from SDK
> > 
> > https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/target/linux/mediatek/files-5.4/drivers/thermal/mediatek/soc_temp_lvts.c#1483
> > 
> 
> That kernel also defines 117°C for MT8195, which leaves me a bit dubious.
> 
> For safety, I would recommend using the same 105°C hw shutdown temperature
> for 7988 as well: after all if you think about it, reaching that kind of
> temperature means that there's a real problem (fan stopped working and/or
> heatsink detached) that needs attention.

ack, will change to same value and put the define on top, abve the mt8195

> This is because you'll have trip points in devicetree that should throttle
> the CPU in case it reaches at least a dangerously high temperature (read:
> a temperature outside the recommended continuous operation range), bringing
> the temperatures down because of the throttling action; I would imagine
> throttling the CPU a bit down at 80°C, then a bit more at 90°C - but then,
> if the temps won't drop like that, there's clearly a HW-related issue that
> must be addressed (like the fan/heatsink scenario that I just described).
> 
> Though, take this as an advice; I'll respect whichever decision you'll take.
> 
> >>> +//enum mt7988_lvts_domain { MT7988_AP_DOMAIN, MT7988_NUM_DOMAIN };
> >>> +
> >>> +enum mt7988_lvts_sensor_enum {
> >>> +	MT7988_TS3_0,
> >>> +	MT7988_TS3_1,
> >>> +	MT7988_TS3_2,
> >>> +	MT7988_TS3_3,
> >>> +	MT7988_TS4_0,
> >>> +	MT7988_TS4_1,
> >>> +	MT7988_TS4_2,
> >>> +	MT7988_TS4_3,
> >>> +	MT7988_NUM_TS
> >>> +};
> > 
> >> This enumeration should be definitions in bindings (mediatek,lvts-thermal.h).
> >>
> >> Besides, the LVTS is about internal temperatures, so those TS3_x and 4_x can
> >> be renamed like what was done for MT8192 and MT8195: this is because you will
> >> never see TS3_2 being CPU2 on a board and CPU4 on another, being those - again -
> >> internal to the SoC, hence unchangeable.
> > 
> > Right these sensors are internally only and i took naming from sdk to avoid confusion. And i have not more information about these internal sensors (special meaning),but their values are packed together to get the resulting (average) temperature.
> > 
> >> Another reason is that you'll anyway have to refer to those sensors in the
> >> devicetree to configure thermal trips and such, so... :-)
> > 
> > In device tree it will look like this:
> > 
> > https://github.com/frank-w/BPI-Router-Linux/blob/6.5-lvts/arch/arm64/boot/dts/mediatek/mt7988a.dtsi#L771
> > 
> > Daniel has also defined thermal trips there,but these are untested atm. I only verified temperature itself i get from sysfs as far as i can (start at ~40°C and reaching ~70 while running).
> > 
> 
> Check how it's done in mt8192.dtsi and mt8195.dtsi: we're using the definitions
> from the bindings for thermal zones.
> At least for consistency (apart from other even more valid considerations), that's
> how it should be done: please do it like that.
> 
> Besides, I think that you can definitely guess what `TS` is CPU related: checking
> the driver and devicetree from the downstream kernel that you provided, what I
> understand is:
> 
> 1. Your naming TS3,4 corresponds to TS2,3 downstream (so it's wrong?)
> 2. Downstream TS2 is related to the CPU cores, so it should be
>     - TS2_0 - CPU0
>     - TS2_1 - CPU1
>     - TS2_2 - CPU2
>     - TS2_3 - CPU3

i took the names from efuse names (which are listed as comment above), had 2/3 before.
need to ask mtk here. same for the second controller and if it is really a "lvts-ap".

> The other set of thermal sensors seem to be completely unused, so we cannot guess
> just by looking at the code. Since you have the hardware, you may be able to take
> a position about what they are by checking what sensor heats up for each "action"
> (could be ethernet controller or infra or whatever else); if in doubt, just leave
> them out, but add a comment saying that there are more sensors and say where, so
> that if anyone finds what they're for, it'll be easy to add them.
> 
> In any case, adding thermal sensors that you don't know about is meaningless, as
> the sense of all this is:
>   - Monitoring temperatures of the hardware
>   - Taking action to prevent temperature overrun
> but if you don't know what you're reading, you can't interpret temperatures for
> something unknown, hence you can't as well take action to prevent overruns, as
> you won't know what's the maximum temperature for "unknown thing" :-)

yes i have only the downstream kernel and some additional information like interupt
number and the efuse offsets for calibration. And for the 8 sensors i get this information:

"8 sensors are distributed across the whole soc in order to get the correct average temperature.
chip designer didn’t disclose more detailed info to us."

so i thought all 8 sensors are taken into account with both controllers to calculate 1
resulting temperature. In downstream i saw also only 1 controller in dts and i have
only 1 interrupt.

https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/target/linux/mediatek/files-5.4/arch/arm64/boot/dts/mediatek/mt7988.dtsi#579

and the compatible for both (mt7988 and mt8195) are without "ap" there, so i took the same.

Where have you seen the mt7988-AP ?

i have only seen 1 sensor which should be the SOC itself and this chip has metal surface
which makes it hard to get real temperature (infrared thermometer gives wrong temp).

> Regards,
> Angelo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ