[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jf-NGa4-xaNaxehkLGPVqwhZrUhLXw2cJ1avtjgT5yPA@mail.gmail.com>
Date: Tue, 19 Apr 2022 20:49:19 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Doug Smythies <dsmythies@...us.net>
Cc: Thomas Gleixner <tglx@...utronix.de>,
"the arch/x86 maintainers" <x86@...nel.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Linux PM <linux-pm@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [patch 00/10] x86/cpu: Consolidate APERF/MPERF code
On Tue, Apr 19, 2022 at 7:32 PM Doug Smythies <dsmythies@...us.net> wrote:
>
> Hi Thomas,
>
> On 2022.04.15 12:20 Thomas Gleixner wrote:
>
> > APERF/MPERF is utilized in two ways:
> >
> > 1) Ad hoc readout of CPU frequency which requires IPIs
> >
> > 2) Frequency scale calculation for frequency invariant scheduling which
> > reads APERF/MPERF on every tick.
> >
> > These are completely independent code parts. Eric observed long latencies
> > when reading /proc/cpuinfo which reads out CPU frequency via #1 and
> > proposed to replace the per CPU single IPI with a broadcast IPI.
> >
> > While this makes the latency smaller, it is not necessary at all because #2
> > samples APERF/MPERF periodically, except on idle or isolated NOHZ full CPUs
> > which are excluded from IPI already.
> >
> > It could be argued that not all APERF/MPERF capable systems have the
> > required BIOS information to enable frequency invariance support, but in
> > practice most of them do. So the APERF/MPERF sampling can be made
> > unconditional and just the frequency scale calculation for the scheduler
> > excluded.
> >
> > The following series consolidates that.
>
> I have used this patch set with the acpi-cpufreq, intel_cpufreq (passive),
> and intel_pstate (active) CPU frequency scaling drivers and various
> governors. Additionally, with HWP both enabled and disabled.
>
> For intel_pstate (active), both HWP enabled or disabled, the behaviour
> of scaling_cur_freq is inconsistent with prior to this patch set and other
> scaling driver governor combinations.
>
> Note there is no issue with " grep MHz /proc/cpuinfo" for any
> combination.
>
> Examples:
>
> No-HWP:
>
> active/powersave:
> doug@s19:~/freq-scalers/trace$ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:2300418
> /sys/devices/system/cpu/cpu10/cpufreq/scaling_cur_freq:0
> /sys/devices/system/cpu/cpu11/cpufreq/scaling_cur_freq:0
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:0
> /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:0
> /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:0
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:0
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:0
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq:0
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq:2300006
> /sys/devices/system/cpu/cpu8/cpufreq/scaling_cur_freq:2300005
> /sys/devices/system/cpu/cpu9/cpufreq/scaling_cur_freq:0
That's because after the changes in this series scaling_cur_freq
returns 0 if the given CPU is idle.
I guess it could return the last known result, but that wouldn't be
more meaningful.
Powered by blists - more mailing lists