[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <98c8fb8f-1fe6-1c05-2093-67efc7ec582a@linaro.org>
Date: Thu, 1 Jun 2023 15:29:07 +0200
From: Konrad Dybcio <konrad.dybcio@...aro.org>
To: Stephan Gerhold <stephan@...hold.net>
Cc: Andy Gross <agross@...nel.org>,
Bjorn Andersson <andersson@...nel.org>,
Michael Turquette <mturquette@...libre.com>,
Stephen Boyd <sboyd@...nel.org>,
Georgi Djakov <djakov@...nel.org>,
Leo Yan <leo.yan@...aro.org>,
Evan Green <evgreen@...omium.org>,
Marijn Suijten <marijn.suijten@...ainline.org>,
linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-clk@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: [PATCH 20/20] interconnect: qcom: Divide clk rate by src node bus
width
On 1.06.2023 15:23, Stephan Gerhold wrote:
> On Thu, Jun 01, 2023 at 02:43:50PM +0200, Konrad Dybcio wrote:
>> On 30.05.2023 21:02, Stephan Gerhold wrote:
>>> On Tue, May 30, 2023 at 06:32:04PM +0200, Konrad Dybcio wrote:
>>>> On 30.05.2023 12:20, Konrad Dybcio wrote:
>>>>> Ever since the introduction of SMD RPM ICC, we've been dividing the
>>>>> clock rate by the wrong bus width. This has resulted in:
>>>>>
>>>>> - setting wrong (mostly too low) rates, affecting performance
>>>>> - most often /2 or /4
>>>>> - things like DDR never hit their full potential
>>>>> - the rates were only correct if src bus width == dst bus width
>>>>> for all src, dst pairs on a given bus
>>>>>
>>>>> - Qualcomm using the same wrong logic in their BSP driver in msm-5.x
>>>>> that ships in production devices today
>>>>>
>>>>> - me losing my sanity trying to find this
>>>>>
>>>>> Resolve it by using dst_qn, if it exists.
>>>>>
>>>>> Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@...aro.org>
>>>>> ---
>>>> The problem is deeper.
>>>>
>>>> Chatting with Stephan (+CC), we tackled a few issues (that I will send
>>>> fixes for in v2):
>>>>
>>>> 1. qcom_icc_rpm_set() should take per-node (src_qn->sum_avg, dst_qn->sum_avg)
>>>> and NOT aggregated bw (unless you want ALL of your nodes on a given provider
>>>> to "go very fast")
>>>>
>>>> 2. the aggregate bw/clk rate calculation should use the node-specific bus widths
>>>> and not only the bus width of the src/dst node, otherwise the average bw
>>>> values will be utterly meaningless
>>>>
>>>
>>> The peak bandwidth / clock rate is wrong as well if you have two paths
>>> with different buswidths on the same bus/NoC. (If someone is interested
>>> in details I can post my specific example I had in the chat, it shows
>>> this more clearly.)
>> agg_peak takes care of that, I believe..
>>
>
> I was just nitpicking on your description here, I think the solution
> you/we had in mind was already correct. :)
>
>>
>>>
>>>> 3. thanks to (1) and (2) qcom_icc_bus_aggregate() can be remodeled to instead
>>>> calculate the clock rates for the two rpm contexts, which we can then max()
>>>> and pass on to the ratesetting call
>>>>
>>>
>>> Sounds good.
>>>
>>>>
>>>> ----8<---- Cutting off Stephan's seal of approval, this is my thinking ----
>>>>
>>>> 4. I *think* Qualcomm really made a mistake in their msm-5.4 driver where they
>>>> took most of the logic from the current -next state and should have been
>>>> setting the rate based on the *DST* provider, or at least that's my
>>>> understanding trying to read the "known good" msm-4.19 driver
>>>> (which remembers msm-3.0 lol).. Or maybe we should keep src but ensure there's
>>>> also a final (dst, dst) vote cast:
>>>>
>>>> provider->inter_set = false // current state upstream
>>>>
>>>> setting apps_proc<->slv_bimc_snoc
>>>> setting mas_bimc_snoc<->slv_snoc_cnoc
>>>> setting mas_snoc_cnoc<->qhs_sdc2
>>>>
>>>>
>>>> provider->inter_set = true // I don't think there's effectively a difference?
>>>>
>>>> setting apps_proc<->slv_bimc_snoc
>>>> setting slv_bimc_snoc<->mas_bimc_snoc
>>>> setting mas_bimc_snoc<->slv_snoc_cnoc
>>>> setting slv_snoc_cnoc<->mas_snoc_cnoc
>>>> setting mas_snoc_cnoc<->qhs_sdc2
>>>>
>>>
>>> I think with our proposed changes above it does no longer matter if a
>>> node is passed as "src" or "dst". This means in your example above you
>>> just waste additional time setting the bandwidth twice for
>>> slv_bimc_snoc, mas_bimc_snoc, slv_snoc_cnoc and mas_snoc_cnoc.
>>> The final outcome is the same with or without "inter_set".
>> Yeah I guess due to the fact that two "real" nodes are always
>> connected by a set of "gateway" nodes, the rate will be applied..
>>
>> I am however not sure if we're supposed to set the bandwidth
>> (via qcom_icc_rpm_set()) on all of them..
>>
>
> I think so? The nodes RPM doesn't care about shouldn't have
> a slv/mas_rpm_id.
Hm I guess the inter_set doesn't make a difference anyway, as you
pointed out.. Thankfully one thing less to fix :D
Konrad
Powered by blists - more mailing lists