[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ca4f5d7-6652-b8d0-3c67-b8b2bdda7163@quicinc.com>
Date: Mon, 23 Dec 2024 19:39:01 +0530
From: Sibi Sankar <quic_sibis@...cinc.com>
To: Cristian Marussi <cristian.marussi@....com>
CC: Sudeep Holla <sudeep.holla@....com>, Johan Hovold <johan@...nel.org>,
<andersson@...nel.org>, <konrad.dybcio@...aro.org>,
<robh+dt@...nel.org>, <krzysztof.kozlowski+dt@...aro.org>,
<linux-kernel@...r.kernel.org>, <linux-arm-msm@...r.kernel.org>,
<devicetree@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
<quic_rgottimu@...cinc.com>, <quic_kshivnan@...cinc.com>,
<conor+dt@...nel.org>, <arm-scmi@...r.kernel.org>
Subject: Re: [PATCH V4 0/5] arm_scmi: vendors: Qualcomm Generic Vendor
Extensions
On 12/17/24 20:15, Cristian Marussi wrote:
> On Tue, Dec 17, 2024 at 05:55:35PM +0530, Sibi Sankar wrote:
>>
>>
>> On 12/5/24 22:31, Sudeep Holla wrote:
>>> On Fri, Nov 22, 2024 at 09:37:47AM +0100, Johan Hovold wrote:
>>>> On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
>>>>> On 11/8/24 20:44, Johan Hovold wrote:
>>>>
>>>>>>> On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
>>>>
>>>>>>>> Second, after loading the protocol and client drivers manually (in that
>>>>>>>> order, shouldn't the client driver pull in the protocol?), I got:
>>>>>>>>
>>>>>>>> scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm 20000
>>>>>>>> arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
>>>>>>>> scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
>>>>>>>> scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
>>>>>>>>
>>>>>>>> which seems to suggest that the firmware on my CRD does not support this
>>>>>>>> feature. Is that the way this should be interpreted? And does that mean
>>>>>>>> that non of the commercial laptops supports this either?
>>>>
>>>>>> Yeah, hopefully Sibi can shed some light on this. I'm using the DT
>>>>>> patch (5/5) from this series, which according to the commit message is
>>>>>> supposed to enable bus scaling on the x1e80100 platform. So I guess
>>>>>> something is missing in my firmware.
>>>>>
>>>>> Nah, it's probably just because of the algo string used.
>>>>> The past few series used caps MEMLAT string instead of
>>>>> memlat to pass the tuneables, looks like all the laptops
>>>>> havn't really switched to it yet. Will revert back to
>>>>> using to lower case memlat so that all devices are
>>>>> supported. Thanks for trying the series out!
>>>>
>>>> I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
>>>> there too, and there I do *not* see the above mentioned -EOPNOSUPP error
>>>> and the memlat driver probes successfully.
>>>>
>>>> On the other hand, this series seems to have no effect on a kernel
>>>> compilation benchmark. Is that expected?
>>>>
>>>
>>> Hijacking this thread to rant about state of firmware implementation on
>>> this platform that gives me zero confidence in merging any of these without
>>> examining each of the interface details in depth and at lengths.
>>>
>>
>
> Hi Sibi,
>
>> Hey Sudeep,
>>
>> Thanks for taking time to review the series.
>>
>>> Also I see the standard protocol like PERF seem to have so many issues which
>>> adds to my no confidence. I can't comment on that thread for specific reasons.
>>
>> ^^ is largely untrue, a lot of finger pointing and a gross
>> misrepresentation of reality :/
>>
>> The only major problem that X1E perf protocol has is a firmware
>> crash in the LEVEL_GET regular message implementation. This
>> pretty much went unnoticed because of messaging in perf implementation
>> in kernel. Given the fastchannel implementation isn't mandatory
>> according to spec, the kernel clearly says it switches to
>> regular messaging when it clearly doesn't do that and uses
>> stale data structures instead. This ensured that level get regular
>> messaging never got tested.
>
> You claimed this a couple of times here and on IRC, but sincerely,
> looking at the fastchannel implementation in SCMI core and Perf, I could
> not track down where this could have happened in the recent code
> (i.e. with or without your recent, welcomed, patches...)
>
> When FC initialization fails and bailout it says:
>
> "Failed to get FC for protocol %X [MSG_ID:%u / RES_ID:%u] - ret:%d. Using regular messaging."
>
> ... and it clears any gathered address for that FC, so that in __scmi_perf_level_get()
> you end up skipping the FC machinery and use messaging
>
> if (dom->fc_info && dom->fc_info[PERF_FC_LEVEL].get_addr) {
> ...
> }
>
> return scmi_perf_msg_level_get(ph, dom->id, level, poll);
>
> Now this is done ONLY for the FC that specifically failed
> initialization, i.e. identified by the tuple PROTO_ID/MSG_ID/RES_ID
> (as stated in the noisy message above where MSG_ID is specified) NOT for
> all Fastchannel, so you can have an FC successfully initialized only on
> the GET but failing in the SET, so only the GET FC will be used.
>
> I dont really understand how the Kernel was misbehaving and using
> instead stale data, neither, if this was the case, I can see where this
> issue would have been fixed.
>
> To be clear, I am not really interested in throwing an argument here, but
> I sincerely dont see where the alleged problem was and how was fixed (kernel
> side), so I fear it could be still there, hidden maybe by a change in the
> platform fw.
>
> Apologies if I missed something along the history of this..
lol, this is pretty embarrassing :|, It's just like you said
looks like this fw supports get_level fastchannel but fails
to say it supports it. This was the reason behind get_level
regular message for being never tested and being buggy and
had nothing to do the kernel messaging or being buggy.
My bad :(, sry again.
-Sibi
>
> Thanks,
> Cristian
Powered by blists - more mailing lists