linux-kernel - Re: [PATCH 5/6] drivers/thermal/exynos: add initial Exynos 850 support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPLW+4mEDJy7hXfQEzLdHuRfTAsRVuHqxh7VC_goR90DVXo3Cg@mail.gmail.com>
Date: Wed, 24 Jul 2024 19:42:50 -0500
From: Sam Protsenko <semen.protsenko@...aro.org>
To: Mateusz Majewski <m.majewski2@...sung.com>
Cc: linux-pm@...r.kernel.org, linux-samsung-soc@...r.kernel.org, 
	devicetree@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
	linux-kernel@...r.kernel.org, Bartlomiej Zolnierkiewicz <bzolnier@...il.com>, 
	Krzysztof Kozlowski <krzk@...nel.org>, "Rafael J. Wysocki" <rafael@...nel.org>, 
	Daniel Lezcano <daniel.lezcano@...aro.org>, Zhang Rui <rui.zhang@...el.com>, 
	Lukasz Luba <lukasz.luba@....com>, Rob Herring <robh@...nel.org>, 
	Conor Dooley <conor+dt@...nel.org>, Alim Akhtar <alim.akhtar@...sung.com>
Subject: Re: [PATCH 5/6] drivers/thermal/exynos: add initial Exynos 850 support

On Wed, Jul 24, 2024 at 10:30 AM Mateusz Majewski
<m.majewski2@...sung.com> wrote:
>
> > It feels like there is an error for Exynos7 case there. Take a look at
> > this commit:
> >
> >     aef27b658b43 ("thermal: exynos: use sanitize_temp_error() in
> > exynos7_tmu_initialize()")
> >
> > I think that commit just forgets to update the shift value for Exynos7
> > properly. This code:
> >
> >     data->temp_error1 = trim_info & tmu_temp_mask;
> >     data->temp_error2 = ((trim_info >> EXYNOS_TRIMINFO_85_SHIFT) &
> >                 EXYNOS_TMU_TEMP_MASK);
> >
> > in case of Exynos7 becomes:
> >
> >     data->temp_error1 = trim_info & 0x1ff;    // mask = 9 bits
> >     data->temp_error2 = (trim_info >> 8) & 0xff;
> >
> > it contradicts itself, because it takes 9 rightmost bits for error1,
> > and then uses 1 of those bits for error2 too. It's obvious that if 9
> > bits are already used for error1, then for error2 it has to be shifted
> > by 9 bits, not 8.
> >
> > That's why I think your patch 2/6 is legit and useful on its own, and
> > it's actually a good catch on your part! But the shift value has to be
> > fixed as well (for Exynos7). It's not ideal you don't have the
> > hardware to test it, but it just screams *bug* to me :) Also, maybe we
> > can ask someone who has Exynos7 hardware to test it for us?
>
> I thought about it for a bit and finally realized that Exynos7 only
> supports one point trimming. That is why in that commit, the original
> exynos7_tmu_initialize did not do anything with temp_error2. So
> temp_error2 will never be used in Exynos7. The real "fix" I guess is to
> only calculate temp_error2 if two point trimming is used, which is
> possible with a very small reordering of exynos7_tmu_initialize. But
> then the shift value will never be reachable in Exynos7 anyway. What do
> you think about this? I feel like it's worth it to change the shift
> value just because the code is a bit confusing anyway.

Good catch! Yes, makes total sense to me. I think it's like you said,
would be better to do both:

1. For 1-point trimming architectures: don't calculate error2, to
avoid confusion
2. For 9-bit temp length architectures: always set the shift variable
to 9, again, to avoid confusion and possible bugs

As I see it, the actual reason why that confusion happened in the
first place, is that the data is not really separated from the code in
this driver. Right now exynos_tmu_match[] table contains SOC_ARCH_*
constants for each compatible, and actual features for each platform
are devised in run-time (e.g. in exynos_map_data_data() switch, and
all other places where data->soc is checked). Because of all those
"ifs" the code looks very non-linear, hard to read, bug prone, and may
even reduce the performance. A better approach would be to extract all
possible data into some const structure containing all features for
each platform, and assign that const structure to corresponding .data
for each compatible. Maybe also add .temp_length field containing 8 or
9 accordingly. Having all that done, all the platform features will be
known at compile time and collected in one place, simplifying the
actual driver's code (most of all those ifs and switches will go
away). One example of such approach is drivers/watchdog/s3c2410_wdt.c.
This way the sanitize function could look something like this:

8<------------------------------------------------------------------------------->8
static void sanitize_temp_error(struct exynos_tmu_data *data, u32 trim_info)
{
    data->temp_error1 = trim_info & data->tmu_temp_mask;
    if (!data->temp_error1 ||
         data->temp_error1 < data->min_efuse_value  ||
         data->temp_error1 > data->max_efuse_value)
             data->temp_error1 = data->efuse_value & data->tmu_temp_mask;

    if (data->cal_type == ONE_POINT_TRIMMING)
        return;

    data->temp_error2 = (trim_info >> data->tmu_temp_shift) &
data->tmu_temp_mask;
    if (!data->temp_error2)
        data->temp_error2 = (data->efuse_value >>
data->tmu_temp_shift) & data->tmu_temp_mask;
}
8<------------------------------------------------------------------------------->8

So this data driven approach doesn't leave much space for mistakes.
Anyways, I'm not asking you to do such rework, it's just my
understanding on the cause of such issues.

Thanks!