linux-kernel - Re: [PATCH v3 0/3] Add support for AArch64 AMUv1-based arch_freq_get_on

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Zg3LeYvOefchf1N3@arm.com>
Date: Wed, 3 Apr 2024 23:34:49 +0200
From: Beata Michalska <beata.michalska@....com>
To: Vanshidhar Konda <vanshikonda@...amperecomputing.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	ionela.voinescu@....com, sudeep.holla@....com, will@...nel.org,
	catalin.marinas@....com, vincent.guittot@...aro.org,
	sumitg@...dia.com, yang@...amperecomputing.com,
	lihuisong@...wei.com
Subject: Re: [PATCH v3 0/3] Add support for AArch64 AMUv1-based
 arch_freq_get_on_cpu

On Mon, Mar 25, 2024 at 09:10:26AM -0700, Vanshidhar Konda wrote:
> On Tue, Mar 12, 2024 at 08:34:28AM +0000, Beata Michalska wrote:
> > Introducing arm64 specific version of arch_freq_get_on_cpu, cashing on
> > existing implementation for FIE and AMUv1 support: the frequency scale
> > factor, updated on each sched tick, serves as a base for retrieving
> > the frequency for a given CPU, representing an average frequency
> > reported between the ticks - thus its accuracy is limited.
> > 
> > The changes have been rather lightly (due to some limitations) tested on
> > an FVP model.
> > 
> 
> I tested these changes on an Ampere system. The results from reading
> scaling_cur_freq look reasonable in the majority of cases I tested. I
> only saw some unexpected behavior with cores that were configured for
> no_hz full.
> 
> I observed the unexplained behavior when I tested as follows:
> 1. Run stress on all cores
>    stress-ng --cpu 186 --timeout 10m --metrics-brief
> 2. Observe scaling_cur_freq and cpuinfo_cur_freq for all cores
>    scaling_cur_freq values were within a few % of cpuinfo_cur_freq
> 3. Kill stress test
> 4. Observe scaling_cur_freq and cpuinfo_cur_freq for all cores
>    scaling_cur_freq values were within a few % of cpuinfo_cur_freq for
>    most cores except the ones configured with no_hz full.
> 
> no_hz full = 122-127
> core   scaling_cur_freq  cpuinfo_cur_freq
> [122]: 2997070           1000000
> [123]: 2997070           1000000
> [124]: 3000038           1000000
> [125]: 2997070           1000000
> [126]: 2997070           1000000
> [127]: 2997070           1000000
> 
> These values were reflected for multiple seconds. I suspect the cores
> entered WFI and there was no update to the scale while those cores were
> idle.
>
Right, so the problem is with updating the counters upon entering idle, which at
this point is being done for all CPUs, and it should exclude the full dynticks
ones - otherwise it leads to such bad readings. So for nohz_full cores cpufreq
driver will have to take care of getting the current frequency.

Will be sending a fix for that.

Thank you very much for testing - appreciate that!

---
BR
Beata
> Thanks,
> Vanshi