lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <408d7048-7ebb-4b32-bdfc-5e26a5c449e0@linaro.org>
Date: Tue, 9 Jul 2024 11:39:51 +0200
From: Konrad Dybcio <konrad.dybcio@...aro.org>
To: Johan Hovold <johan@...nel.org>, Sibi Sankar <quic_sibis@...cinc.com>
Cc: sudeep.holla@....com, cristian.marussi@....com, andersson@...nel.org,
 jassisinghbrar@...il.com, robh+dt@...nel.org,
 krzysztof.kozlowski+dt@...aro.org, dmitry.baryshkov@...aro.org,
 linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org,
 devicetree@...r.kernel.org, quic_rgottimu@...cinc.com,
 quic_kshivnan@...cinc.com, conor+dt@...nel.org, quic_nkela@...cinc.com,
 quic_psodagud@...cinc.com, abel.vesa@...aro.org
Subject: Re: [PATCH V6 5/5] arm64: dts: qcom: x1e80100: Enable cpufreq

On 9.07.2024 11:13 AM, Johan Hovold wrote:
> Hi Sibi,
> 
> On Wed, Jul 03, 2024 at 01:29:11AM +0530, Sibi Sankar wrote:
>> On 7/2/24 21:25, Johan Hovold wrote:
>>> On Wed, Jun 12, 2024 at 06:10:56PM +0530, Sibi Sankar wrote:
>>>> Enable cpufreq on X1E80100 SoCs through the SCMI perf protocol node.
> 
>>> This series gives a nice performance boost on the x1e80100 CRD, but I'm
>>> seeing a bunch of warnings and errors that need to be addressed:
>>>
>>> [    9.533053] arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:0] - ret:-95. Using regular messaging.
>>> [    9.549458] arm-scmi firmware:scmi: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>> [    9.563925] arm-scmi firmware:scmi: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>> [    9.572835] arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:1] - ret:-95. Using regular messaging.
>>> [    9.609471] arm-scmi firmware:scmi: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>> [    9.633341] arm-scmi firmware:scmi: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>> [    9.650000] arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:2] - ret:-95. Using regular messaging.
>>
>> X1E uses fast channels only for message-id: 7 (level set) and regular
>> channels for all the other messages. The spec doesn't mandate fast
>> channels for any of the supported message ids for the perf protocol.
>> So nothing to fix here.
> 
> I didn't look at this in any detail, but if the firmware is spec
> compliant you should not be spamming the logs with warnings. Not sure
> how best to address that, but you could, for example, add a quirk for
> qcom fw or at a minimum demote this mess to info level.
> 
> Also the failure to add oops_by_lvl appears to be a separate issue (e.g.
> related to the duplicate entries).
> 
>>> [    9.727098] cpu cpu4: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
>>> [    9.737157] cpu cpu4: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
>>> [    9.875039] cpu cpu8: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
>>> [    9.888428] cpu cpu8: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
>>
>> The duplicate entries reported by the perf protocol come directly from
>> the speed bins. I was told the duplicate entry with volt 0 is meant to
>> indicate a lower power way of achieving the said frequency at a lower
>> core count. We have no way of using it in the kernel and it gets safely
>> discarded. So again nothing to fix in the kernel.
> 
> Again, you should not be spamming the logs with warnings for things are
> benign (e.g. as it may prevent people from noticing real issues).
> 
> Also these duplicate entries do not seem to get safely discarded as they
> result in a bunch of operations failing loudly at boot (e.g. the
> oops_by_lvl warning above) and similarly at resume as I recently
> noticed:
> 
> [   42.690569] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
> [   42.704360] cpu cpu4: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
> [   42.737865] cpu cpu4: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
> [   42.752943] debugfs: File 'cpu5' in directory 'opp' already present!
> [   42.759956] debugfs: File 'cpu6' in directory 'opp' already present!
> [   42.766641] debugfs: File 'cpu7' in directory 'opp' already present!
> ...
> [   42.855520] CPU8: Booted secondary processor 0x0000020000 [0x511f0011]
> [   42.865188] cpu cpu8: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
> [   42.898494] cpu cpu8: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
> [   42.913559] debugfs: File 'cpu9' in directory 'opp' already present!
> [   42.920265] debugfs: File 'cpu10' in directory 'opp' already present!
> [   42.927029] debugfs: File 'cpu11' in directory 'opp' already present!
> 
> Perhaps you can find some way to filter out the unused, duplicate
> entries for qualcomm fw so that all of these issues go away.

I would say that the firmware should probably change the PSTATEs'
"enabled" state based on availability and report that to the OS..
Or the OS should know the conditions (enabled core count as you mentioned)
and decide whether it makes sense to shut down these cores, based on
workloads.. The latter sounds more sane..

The SCMI perf protocol already exposes power metrics (through opp->power)
for EAS purposes, so perhaps additional field could be added (cpu mask /
cpu count, depending on whether the specific cores being off is meaningful)
so that the OS can make more educated choices here.. otherwise this almost
looks like a hack that made it into the firmware because there was no
time left or something..

You mentioned that "We have no way of using it in the kernel", but is that
actually true? Can you not set that OPP if the conditions are met?

Konrad

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ