linux-kernel - Re: [PATCH V4 0/5] arm_scmi: vendors: Qualcomm Generic Vendor Extensions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z2GOLFqnexEalBx_@pluto>
Date: Tue, 17 Dec 2024 14:45:24 +0000
From: Cristian Marussi <cristian.marussi@....com>
To: Sibi Sankar <quic_sibis@...cinc.com>
Cc: Sudeep Holla <sudeep.holla@....com>, Johan Hovold <johan@...nel.org>,
	Cristian Marussi <cristian.marussi@....com>, andersson@...nel.org,
	konrad.dybcio@...aro.org, robh+dt@...nel.org,
	krzysztof.kozlowski+dt@...aro.org, linux-kernel@...r.kernel.org,
	linux-arm-msm@...r.kernel.org, devicetree@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org, quic_rgottimu@...cinc.com,
	quic_kshivnan@...cinc.com, conor+dt@...nel.org,
	arm-scmi@...r.kernel.org
Subject: Re: [PATCH V4 0/5] arm_scmi: vendors: Qualcomm Generic Vendor
 Extensions

On Tue, Dec 17, 2024 at 05:55:35PM +0530, Sibi Sankar wrote:
> 
> 
> On 12/5/24 22:31, Sudeep Holla wrote:
> > On Fri, Nov 22, 2024 at 09:37:47AM +0100, Johan Hovold wrote:
> > > On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
> > > > On 11/8/24 20:44, Johan Hovold wrote:
> > > 
> > > > > > On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
> > > 
> > > > > > > Second, after loading the protocol and client drivers manually (in that
> > > > > > > order, shouldn't the client driver pull in the protocol?), I got:
> > > > > > > 
> > > > > > > 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
> > > > > > > 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
> > > > > > > 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
> > > > > > > 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
> > > > > > > 
> > > > > > > which seems to suggest that the firmware on my CRD does not support this
> > > > > > > feature. Is that the way this should be interpreted? And does that mean
> > > > > > > that non of the commercial laptops supports this either?
> > > 
> > > > > Yeah, hopefully Sibi can shed some light on this. I'm using the DT
> > > > > patch (5/5) from this series, which according to the commit message is
> > > > > supposed to enable bus scaling on the x1e80100 platform. So I guess
> > > > > something is missing in my firmware.
> > > > 
> > > > Nah, it's probably just because of the algo string used.
> > > > The past few series used caps MEMLAT string instead of
> > > > memlat to pass the tuneables, looks like all the laptops
> > > > havn't really switched to it yet. Will revert back to
> > > > using to lower case memlat so that all devices are
> > > > supported. Thanks for trying the series out!
> > > 
> > > I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
> > > there too, and there I do *not* see the above mentioned -EOPNOSUPP error
> > > and the memlat driver probes successfully.
> > > 
> > > On the other hand, this series seems to have no effect on a kernel
> > > compilation benchmark. Is that expected?
> > > 
> > 
> > Hijacking this thread to rant about state of firmware implementation on
> > this platform that gives me zero confidence in merging any of these without
> > examining each of the interface details in depth and at lengths.
> > 
> 

Hi Sibi,

> Hey Sudeep,
> 
> Thanks for taking time to review the series.
> 
> > Also I see the standard protocol like PERF seem to have so many issues which
> > adds to my no confidence. I can't comment on that thread for specific reasons.
> 
> ^^ is largely untrue, a lot of finger pointing and a gross
> misrepresentation of reality :/
> 
> The only major problem that X1E perf protocol has is a firmware
> crash in the LEVEL_GET regular message implementation. This
> pretty much went unnoticed because of messaging in perf implementation
> in kernel. Given the fastchannel implementation isn't mandatory
> according to spec, the kernel clearly says it switches to
> regular messaging when it clearly doesn't do that and uses
> stale data structures instead. This ensured that level get regular
> messaging never got tested.

You claimed this a couple of times here and on IRC, but sincerely,
looking at the fastchannel implementation in SCMI core and Perf, I could
not track down where this could have happened in the recent code
(i.e. with or without your recent, welcomed, patches...)

When FC initialization fails and bailout it says:
	
	"Failed to get FC for protocol %X [MSG_ID:%u / RES_ID:%u] - ret:%d. Using regular messaging."

... and it clears any gathered address for that FC, so that in __scmi_perf_level_get()
you end up skipping the FC machinery and use messaging

	if (dom->fc_info && dom->fc_info[PERF_FC_LEVEL].get_addr) {
		...
	}

	return scmi_perf_msg_level_get(ph, dom->id, level, poll);

Now this is done ONLY for the FC that specifically failed
initialization, i.e. identified by the tuple PROTO_ID/MSG_ID/RES_ID
(as stated in the noisy message above where MSG_ID is specified) NOT for
all Fastchannel, so you can have an FC successfully initialized only on
the GET but failing in the SET, so only the GET FC will be used.

I dont really understand how the Kernel was misbehaving and using
instead stale data, neither, if this was the case, I can see where this
issue would have been fixed.

To be clear, I am not really interested in throwing an argument here, but
I sincerely dont see where the alleged problem was and how was fixed (kernel
side), so I fear it could be still there, hidden maybe by a change in the
platform fw.

Apologies if I missed something along the history of this..

Thanks,
Cristian