lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b989522d-bd41-4d76-91a9-3cf680214003@linaro.org>
Date: Wed, 30 Apr 2025 18:56:38 +0200
From: neil.armstrong@...aro.org
To: Konrad Dybcio <konrad.dybcio@....qualcomm.com>,
 Konrad Dybcio <konradybcio@...nel.org>, Rob Clark <robdclark@...il.com>,
 Sean Paul <sean@...rly.run>, Abhinav Kumar <quic_abhinavk@...cinc.com>,
 Dmitry Baryshkov <lumag@...nel.org>, David Airlie <airlied@...il.com>,
 Simona Vetter <simona@...ll.ch>, Bjorn Andersson <andersson@...nel.org>,
 Rob Herring <robh@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>,
 Conor Dooley <conor+dt@...nel.org>
Cc: Marijn Suijten <marijn.suijten@...ainline.org>,
 linux-arm-msm@...r.kernel.org, dri-devel@...ts.freedesktop.org,
 freedreno@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
 devicetree@...r.kernel.org, Konrad Dybcio <konrad.dybcio@...aro.org>
Subject: Re: [PATCH RFT v6 2/5] drm/msm/adreno: Add speedbin data for SM8550 /
 A740

On 30/04/2025 18:39, Konrad Dybcio wrote:
> On 4/30/25 6:19 PM, neil.armstrong@...aro.org wrote:
>> On 30/04/2025 17:36, Konrad Dybcio wrote:
>>> On 4/30/25 4:49 PM, neil.armstrong@...aro.org wrote:
>>>> On 30/04/2025 15:09, Konrad Dybcio wrote:
>>>>> On 4/30/25 2:49 PM, neil.armstrong@...aro.org wrote:
>>>>>> On 30/04/2025 14:35, Konrad Dybcio wrote:
>>>>>>> On 4/30/25 2:26 PM, neil.armstrong@...aro.org wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 30/04/2025 13:34, Konrad Dybcio wrote:
>>>>>>>>> From: Konrad Dybcio <konrad.dybcio@...aro.org>
>>>>>>>>>
>>>>>>>>> Add speebin data for A740, as found on SM8550 and derivative SoCs.
>>>>>>>>>
>>>>>>>>> For non-development SoCs it seems that "everything except FC_AC, FC_AF
>>>>>>>>> should be speedbin 1", but what the values are for said "everything" are
>>>>>>>>> not known, so that's an exercise left to the user..
>>>>>>>>>
>>>>>>>>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@...aro.org>
>>>>>>>>> ---
>>>>>>>>>       drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 8 ++++++++
>>>>>>>>>       1 file changed, 8 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
>>>>>>>>> index 53e2ff4406d8f0afe474aaafbf0e459ef8f4577d..61daa331567925e529deae5e25d6fb63a8ba8375 100644
>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
>>>>>>>>> @@ -11,6 +11,9 @@
>>>>>>>>>       #include "a6xx.xml.h"
>>>>>>>>>       #include "a6xx_gmu.xml.h"
>>>>>>>>>       +#include <linux/soc/qcom/smem.h>
>>>>>>>>> +#include <linux/soc/qcom/socinfo.h>
>>>>>>>>> +
>>>>>>>>>       static const struct adreno_reglist a612_hwcg[] = {
>>>>>>>>>           {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x22222222},
>>>>>>>>>           {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x02222220},
>>>>>>>>> @@ -1431,6 +1434,11 @@ static const struct adreno_info a7xx_gpus[] = {
>>>>>>>>>               },
>>>>>>>>>               .address_space_size = SZ_16G,
>>>>>>>>>               .preempt_record_size = 4192 * SZ_1K,
>>>>>>>>> +        .speedbins = ADRENO_SPEEDBINS(
>>>>>>>>> +            { ADRENO_SKU_ID(SOCINFO_FC_AC), 0 },
>>>>>>>>> +            { ADRENO_SKU_ID(SOCINFO_FC_AF), 0 },
>>>>>>>>> +            /* Other feature codes (on prod SoCs) should match to speedbin 1 */
>>>>>>>>
>>>>>>>> I'm trying to understand this sentence. because reading patch 4, when there's no match
>>>>>>>> devm_pm_opp_set_supported_hw() is simply never called so how can it match speedbin 1 ?
>>>>>>>
>>>>>>> What I'm saying is that all other entries that happen to be possibly
>>>>>>> added down the line are expected to be speedbin 1 (i.e. BIT(1))
>>>>>>>
>>>>>>>> Before this change the fallback was speedbin = BIT(0), but this disappeared.
>>>>>>>
>>>>>>> No, the default was to allow speedbin mask ~(0U)
>>>>>>
>>>>>> Hmm no:
>>>>>>
>>>>>>        supp_hw = fuse_to_supp_hw(info, speedbin);
>>>>>>
>>>>>>        if (supp_hw == UINT_MAX) {
>>>>>>            DRM_DEV_ERROR(dev,
>>>>>>                "missing support for speed-bin: %u. Some OPPs may not be supported by hardware\n",
>>>>>>                speedbin);
>>>>>>            supp_hw = BIT(0); /* Default */
>>>>>>        }
>>>>>>
>>>>>>        ret = devm_pm_opp_set_supported_hw(dev, &supp_hw, 1);
>>>>>>        if (ret)
>>>>>>            return ret;
>>>>>
>>>>> Right, that's my own code even..
>>>>>
>>>>> in any case, the kernel can't know about the speed bins that aren't
>>>>> defined and here we only define bin0, which doesn't break things
>>>>>
>>>>> the kernel isn't aware about hw with bin1 with or without this change
>>>>> so it effectively doesn't matter
>>>>
>>>> But it's regression for the other platforms, where before an unknown SKU
>>>> mapped to supp_hw=BIT(0)
>>>>
>>>> Not calling devm_pm_opp_set_supported_hw() is a major regression,
>>>> if the opp-supported-hw is present, the OPP will be rejected:
>>>
>>> A comment in patch 4 explains that. We can either be forwards or backwards
>>> compatible (i.e. accept a limited amount of
>>> speedbin_in_driver x speedbin_in_dt combinations)
>>
>> I have a hard time understanding the change, please be much more verbose
>> in the cover letter and commit messages.
>>
>> The fact that you do such a large change in the speedbin policy in patch 4
>> makes it hard to understand why it's needed in the first place.
>>
>> Finally I'm very concerned that "old" SM8550 DT won't work on new kernels,
>> this is frankly unacceptable, and this should be addressed in the first
>> place.
>>
>> The nvmem situation was much simple, where we considered we added the nvmem
>> property at the same time as opp-supported-hw in OPPs, but it's no more the
>> case.
>>
>> So I think the OPP API should probably be extended to address this situation
>> first, since if we do not have the opp-supported-hw in OPPs, all OPPs are safe.
>>
>> So this code:
>>      count = of_property_count_u32_elems(np, "opp-supported-hw");
>>      if (count <= 0 || count % levels) {
>>          dev_err(dev, "%s: Invalid opp-supported-hw property (%d)\n",
>>              __func__, count);
>>          return false;
>>      }
>> should return true in this specific case, like a supported_hw_failsafe mode.
> 
> Not really. opp-supported-hws = <BIT(0)> usually translates to the *fastest*
> bin in our case, so perhaps that change I made previously to default to it
> wasn't the wisest. In other words, all slower SKUs that weren't added to the
> kernel catalog & dt are potentially getting overclocked, which is no bueno.
> That is not always the case, but it most certainly has been for a number of
> years.
> 
> Old DTs in this case would be DTs lacking opp-supported-hw with the kernel
> having speedbin tables. The inverse ("too new DTs") case translates into
> "someone put some unexpected stuff in dt and the kernel has no idea what
> to do with it".
> In this context, old DTs would continue to work after patch 4, as the first
> early return in adreno_set_speedbin() takes care of that.

No.

With only patches 1-4 applied (keep "old" DT) on today's -next:

SM8550-QRD:
[    7.574569] msm_dpu ae01000.display-controller: bound ae94000.dsi (ops dsi_ops [msm])
[    7.586578] msm_dpu ae01000.display-controller: bound ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm])
[    7.597886] adreno 3d00000.gpu: error -EINVAL: Unknown speed bin fuse value: 0x2
[    7.605518] msm_dpu ae01000.display-controller: failed to load adreno gpu
[    7.612599] msm_dpu ae01000.display-controller: failed to bind 3d00000.gpu (ops a3xx_ops [msm]): -22

SM8550-HDK:
[   10.137558] msm_dpu ae01000.display-controller: bound ae94000.dsi (ops dsi_ops [msm])
[   10.151796] msm_dpu ae01000.display-controller: bound ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm])
[   10.163358] adreno 3d00000.gpu: error -EINVAL: Unknown speed bin fuse value: 0x2
[   10.171066] msm_dpu ae01000.display-controller: failed to load adreno gpu
[   10.178118] msm_dpu ae01000.display-controller: failed to bind 3d00000.gpu (ops a3xx_ops [msm]): -22

With:
=================><==================
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 61daa3315679..7cac14a585a9 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -1435,6 +1435,7 @@ static const struct adreno_info a7xx_gpus[] = {
                 .address_space_size = SZ_16G,
                 .preempt_record_size = 4192 * SZ_1K,
                 .speedbins = ADRENO_SPEEDBINS(
+                       { ADRENO_SKU_ID(SOCINFO_FC_AB), 1 },
                         { ADRENO_SKU_ID(SOCINFO_FC_AC), 0 },
                         { ADRENO_SKU_ID(SOCINFO_FC_AF), 0 },
                         /* Other feature codes (on prod SoCs) should match to speedbin 1 */
=================><==================

SM8550-QRD:
[    7.681816] msm_dpu ae01000.display-controller: bound ae94000.dsi (ops dsi_ops [msm])
[    7.694479] msm_dpu ae01000.display-controller: bound ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm])
[    7.705784] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[    7.714322] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[    7.722851] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[    7.722853] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[    7.722855] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[    7.722856] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[    7.722858] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[    7.722860] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[    7.722861] adreno 3d00000.gpu: _of_add_opp_table_v2: no supported OPPs
[    7.722863] adreno 3d00000.gpu: [drm:adreno_gpu_init [msm]] *ERROR* Unable to set the OPP table

SM8550-HDK:
[   10.119986] msm_dpu ae01000.display-controller: bound ae94000.dsi (ops dsi_ops [msm])
[   10.133872] msm_dpu ae01000.display-controller: bound ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm])
[   10.147377] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[   10.161640] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[   10.171198] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[   10.179756] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[   10.188313] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[   10.196868] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[   10.205424] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[   10.226025] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-supported-hw property (-22)
[   10.234589] adreno 3d00000.gpu: _of_add_opp_table_v2: no supported OPPs
[   10.247165] adreno 3d00000.gpu: [drm:adreno_gpu_init [msm]] *ERROR* Unable to set the OPP table

This behaves exactly as I said, so please fix it.

Neil

> 
> Konrad


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ