[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f5734944-1ed2-4acc-a015-0c638c331bbe@quicinc.com>
Date: Thu, 1 May 2025 14:59:58 +0530
From: Akhil P Oommen <quic_akhilpo@...cinc.com>
To: Konrad Dybcio <konrad.dybcio@....qualcomm.com>,
<neil.armstrong@...aro.org>
CC: Konrad Dybcio <konradybcio@...nel.org>, Rob Clark <robdclark@...il.com>,
Sean Paul <sean@...rly.run>, Abhinav Kumar <quic_abhinavk@...cinc.com>,
Dmitry Baryshkov <lumag@...nel.org>, David Airlie <airlied@...il.com>,
"Simona Vetter" <simona@...ll.ch>,
Bjorn Andersson <andersson@...nel.org>, Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Conor Dooley <conor+dt@...nel.org>,
Marijn Suijten
<marijn.suijten@...ainline.org>,
<linux-arm-msm@...r.kernel.org>, <dri-devel@...ts.freedesktop.org>,
<freedreno@...ts.freedesktop.org>, <linux-kernel@...r.kernel.org>,
<devicetree@...r.kernel.org>,
Konrad Dybcio
<konrad.dybcio@...aro.org>
Subject: Re: [PATCH RFT v6 2/5] drm/msm/adreno: Add speedbin data for SM8550 /
A740
On 4/30/2025 10:26 PM, neil.armstrong@...aro.org wrote:
> On 30/04/2025 18:39, Konrad Dybcio wrote:
>> On 4/30/25 6:19 PM, neil.armstrong@...aro.org wrote:
>>> On 30/04/2025 17:36, Konrad Dybcio wrote:
>>>> On 4/30/25 4:49 PM, neil.armstrong@...aro.org wrote:
>>>>> On 30/04/2025 15:09, Konrad Dybcio wrote:
>>>>>> On 4/30/25 2:49 PM, neil.armstrong@...aro.org wrote:
>>>>>>> On 30/04/2025 14:35, Konrad Dybcio wrote:
>>>>>>>> On 4/30/25 2:26 PM, neil.armstrong@...aro.org wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 30/04/2025 13:34, Konrad Dybcio wrote:
>>>>>>>>>> From: Konrad Dybcio <konrad.dybcio@...aro.org>
>>>>>>>>>>
>>>>>>>>>> Add speebin data for A740, as found on SM8550 and derivative
>>>>>>>>>> SoCs.
>>>>>>>>>>
>>>>>>>>>> For non-development SoCs it seems that "everything except
>>>>>>>>>> FC_AC, FC_AF
>>>>>>>>>> should be speedbin 1", but what the values are for said
>>>>>>>>>> "everything" are
>>>>>>>>>> not known, so that's an exercise left to the user..
>>>>>>>>>>
>>>>>>>>>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@...aro.org>
>>>>>>>>>> ---
>>>>>>>>>> drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 8 ++++++++
>>>>>>>>>> 1 file changed, 8 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/
>>>>>>>>>> drivers/gpu/drm/msm/adreno/a6xx_catalog.c
>>>>>>>>>> index
>>>>>>>>>> 53e2ff4406d8f0afe474aaafbf0e459ef8f4577d..61daa331567925e529deae5e25d6fb63a8ba8375 100644
>>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
>>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
>>>>>>>>>> @@ -11,6 +11,9 @@
>>>>>>>>>> #include "a6xx.xml.h"
>>>>>>>>>> #include "a6xx_gmu.xml.h"
>>>>>>>>>> +#include <linux/soc/qcom/smem.h>
>>>>>>>>>> +#include <linux/soc/qcom/socinfo.h>
>>>>>>>>>> +
>>>>>>>>>> static const struct adreno_reglist a612_hwcg[] = {
>>>>>>>>>> {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x22222222},
>>>>>>>>>> {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x02222220},
>>>>>>>>>> @@ -1431,6 +1434,11 @@ static const struct adreno_info
>>>>>>>>>> a7xx_gpus[] = {
>>>>>>>>>> },
>>>>>>>>>> .address_space_size = SZ_16G,
>>>>>>>>>> .preempt_record_size = 4192 * SZ_1K,
>>>>>>>>>> + .speedbins = ADRENO_SPEEDBINS(
>>>>>>>>>> + { ADRENO_SKU_ID(SOCINFO_FC_AC), 0 },
>>>>>>>>>> + { ADRENO_SKU_ID(SOCINFO_FC_AF), 0 },
>>>>>>>>>> + /* Other feature codes (on prod SoCs) should
>>>>>>>>>> match to speedbin 1 */
>>>>>>>>>
>>>>>>>>> I'm trying to understand this sentence. because reading patch
>>>>>>>>> 4, when there's no match
>>>>>>>>> devm_pm_opp_set_supported_hw() is simply never called so how
>>>>>>>>> can it match speedbin 1 ?
>>>>>>>>
>>>>>>>> What I'm saying is that all other entries that happen to be
>>>>>>>> possibly
>>>>>>>> added down the line are expected to be speedbin 1 (i.e. BIT(1))
>>>>>>>>
>>>>>>>>> Before this change the fallback was speedbin = BIT(0), but this
>>>>>>>>> disappeared.
>>>>>>>>
>>>>>>>> No, the default was to allow speedbin mask ~(0U)
>>>>>>>
>>>>>>> Hmm no:
>>>>>>>
>>>>>>> supp_hw = fuse_to_supp_hw(info, speedbin);
>>>>>>>
>>>>>>> if (supp_hw == UINT_MAX) {
>>>>>>> DRM_DEV_ERROR(dev,
>>>>>>> "missing support for speed-bin: %u. Some OPPs may
>>>>>>> not be supported by hardware\n",
>>>>>>> speedbin);
>>>>>>> supp_hw = BIT(0); /* Default */
>>>>>>> }
>>>>>>>
>>>>>>> ret = devm_pm_opp_set_supported_hw(dev, &supp_hw, 1);
>>>>>>> if (ret)
>>>>>>> return ret;
>>>>>>
>>>>>> Right, that's my own code even..
>>>>>>
>>>>>> in any case, the kernel can't know about the speed bins that aren't
>>>>>> defined and here we only define bin0, which doesn't break things
>>>>>>
>>>>>> the kernel isn't aware about hw with bin1 with or without this change
>>>>>> so it effectively doesn't matter
>>>>>
>>>>> But it's regression for the other platforms, where before an
>>>>> unknown SKU
>>>>> mapped to supp_hw=BIT(0)
>>>>>
>>>>> Not calling devm_pm_opp_set_supported_hw() is a major regression,
>>>>> if the opp-supported-hw is present, the OPP will be rejected:
>>>>
>>>> A comment in patch 4 explains that. We can either be forwards or
>>>> backwards
>>>> compatible (i.e. accept a limited amount of
>>>> speedbin_in_driver x speedbin_in_dt combinations)
>>>
>>> I have a hard time understanding the change, please be much more verbose
>>> in the cover letter and commit messages.
>>>
>>> The fact that you do such a large change in the speedbin policy in
>>> patch 4
>>> makes it hard to understand why it's needed in the first place.
>>>
>>> Finally I'm very concerned that "old" SM8550 DT won't work on new
>>> kernels,
>>> this is frankly unacceptable, and this should be addressed in the first
>>> place.
>>>
>>> The nvmem situation was much simple, where we considered we added the
>>> nvmem
>>> property at the same time as opp-supported-hw in OPPs, but it's no
>>> more the
>>> case.
>>>
>>> So I think the OPP API should probably be extended to address this
>>> situation
>>> first, since if we do not have the opp-supported-hw in OPPs, all OPPs
>>> are safe.
>>>
>>> So this code:
>>> count = of_property_count_u32_elems(np, "opp-supported-hw");
>>> if (count <= 0 || count % levels) {
>>> dev_err(dev, "%s: Invalid opp-supported-hw property (%d)\n",
>>> __func__, count);
>>> return false;
>>> }
>>> should return true in this specific case, like a
>>> supported_hw_failsafe mode.
>>
>> Not really. opp-supported-hws = <BIT(0)> usually translates to the
>> *fastest*
>> bin in our case, so perhaps that change I made previously to default
>> to it
>> wasn't the wisest. In other words, all slower SKUs that weren't added
>> to the
>> kernel catalog & dt are potentially getting overclocked, which is no
>> bueno.
>> That is not always the case, but it most certainly has been for a
>> number of
>> years.
>>
>> Old DTs in this case would be DTs lacking opp-supported-hw with the
>> kernel
>> having speedbin tables. The inverse ("too new DTs") case translates into
>> "someone put some unexpected stuff in dt and the kernel has no idea what
>> to do with it".
>> In this context, old DTs would continue to work after patch 4, as the
>> first
>> early return in adreno_set_speedbin() takes care of that.
>
> No.
>
> With only patches 1-4 applied (keep "old" DT) on today's -next:
>
> SM8550-QRD:
> [ 7.574569] msm_dpu ae01000.display-controller: bound ae94000.dsi
> (ops dsi_ops [msm])
> [ 7.586578] msm_dpu ae01000.display-controller: bound
> ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm])
> [ 7.597886] adreno 3d00000.gpu: error -EINVAL: Unknown speed bin fuse
> value: 0x2
> [ 7.605518] msm_dpu ae01000.display-controller: failed to load adreno
> gpu
> [ 7.612599] msm_dpu ae01000.display-controller: failed to bind
> 3d00000.gpu (ops a3xx_ops [msm]): -22
>
> SM8550-HDK:
> [ 10.137558] msm_dpu ae01000.display-controller: bound ae94000.dsi
> (ops dsi_ops [msm])
> [ 10.151796] msm_dpu ae01000.display-controller: bound
> ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm])
> [ 10.163358] adreno 3d00000.gpu: error -EINVAL: Unknown speed bin fuse
> value: 0x2
> [ 10.171066] msm_dpu ae01000.display-controller: failed to load adreno
> gpu
> [ 10.178118] msm_dpu ae01000.display-controller: failed to bind
> 3d00000.gpu (ops a3xx_ops [msm]): -22
>
> With:
> =================><==================
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/drivers/gpu/
> drm/msm/adreno/a6xx_catalog.c
> index 61daa3315679..7cac14a585a9 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> @@ -1435,6 +1435,7 @@ static const struct adreno_info a7xx_gpus[] = {
> .address_space_size = SZ_16G,
> .preempt_record_size = 4192 * SZ_1K,
> .speedbins = ADRENO_SPEEDBINS(
> + { ADRENO_SKU_ID(SOCINFO_FC_AB), 1 },
> { ADRENO_SKU_ID(SOCINFO_FC_AC), 0 },
> { ADRENO_SKU_ID(SOCINFO_FC_AF), 0 },
> /* Other feature codes (on prod SoCs) should
> match to speedbin 1 */
> =================><==================
>
> SM8550-QRD:
> [ 7.681816] msm_dpu ae01000.display-controller: bound ae94000.dsi
> (ops dsi_ops [msm])
> [ 7.694479] msm_dpu ae01000.display-controller: bound
> ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm])
> [ 7.705784] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 7.714322] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 7.722851] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 7.722853] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 7.722855] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 7.722856] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 7.722858] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 7.722860] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 7.722861] adreno 3d00000.gpu: _of_add_opp_table_v2: no supported OPPs
> [ 7.722863] adreno 3d00000.gpu: [drm:adreno_gpu_init [msm]] *ERROR*
> Unable to set the OPP table
>
> SM8550-HDK:
> [ 10.119986] msm_dpu ae01000.display-controller: bound ae94000.dsi
> (ops dsi_ops [msm])
> [ 10.133872] msm_dpu ae01000.display-controller: bound
> ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm])
> [ 10.147377] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 10.161640] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 10.171198] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 10.179756] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 10.188313] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 10.196868] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 10.205424] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 10.226025] adreno 3d00000.gpu: _opp_is_supported: Invalid opp-
> supported-hw property (-22)
> [ 10.234589] adreno 3d00000.gpu: _of_add_opp_table_v2: no supported OPPs
> [ 10.247165] adreno 3d00000.gpu: [drm:adreno_gpu_init [msm]] *ERROR*
> Unable to set the OPP table
>
> This behaves exactly as I said, so please fix it.
Konrad,
iirc, we discussed this in one of the earlier revision. There is a
circular dependency between the driver change for SKU support and the dt
change that adds supported_hw bitmask in opp-table. Only scenario it
works is when you add these to the initial patches series of a new GPU.
It will be very useful if we can break this circular dependency.
-Akhil.
>
> Neil
>
>>
>> Konrad
>
>
Powered by blists - more mailing lists