[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGETcx-sYUAE-Pgnf20ZXR_9WJOZ5FqksL=8i8+9PCLLazy54w@mail.gmail.com>
Date: Tue, 16 Jan 2024 15:47:57 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Marc Zyngier <maz@...nel.org>
Cc: David Dai <davidai@...gle.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>, Rob Herring <robh+dt@...nel.org>,
Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>, Conor Dooley <conor+dt@...nel.org>,
Sudeep Holla <sudeep.holla@....com>, Quentin Perret <qperret@...gle.com>,
Masami Hiramatsu <mhiramat@...gle.com>, Will Deacon <will@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, Vincent Guittot <vincent.guittot@...aro.org>,
Oliver Upton <oliver.upton@...ux.dev>, Dietmar Eggemann <dietmar.eggemann@....com>,
Pavan Kondeti <quic_pkondeti@...cinc.com>, Gupta Pankaj <pankaj.gupta@....com>,
Mel Gorman <mgorman@...e.de>, kernel-team@...roid.com, linux-pm@...r.kernel.org,
devicetree@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/2] dt-bindings: cpufreq: add virtual cpufreq device
On Sat, Jan 13, 2024 at 1:37 AM Marc Zyngier <maz@...nel.org> wrote:
>
> On Fri, 12 Jan 2024 22:02:39 +0000,
> Saravana Kannan <saravanak@...gle.com> wrote:
> >
> > Sorry for the delay in response. Was very busy for a while and then
> > holidays started.
> >
> > On Fri, Dec 8, 2023 at 12:52 AM Marc Zyngier <maz@...nel.org> wrote:
> > >
> > > On Thu, 07 Dec 2023 22:44:36 +0000,
> > > Saravana Kannan <saravanak@...gle.com> wrote:
> > > >
> > > > On Wed, Nov 15, 2023 at 12:49 AM Marc Zyngier <maz@...nel.org> wrote:
> > > > >
> > > > > On Sat, 11 Nov 2023 01:49:29 +0000,
> > > > > David Dai <davidai@...gle.com> wrote:
> > > > > >
> > > > > > Adding bindings to represent a virtual cpufreq device.
> > > > > >
> > > > > > Virtual machines may expose MMIO regions for a virtual cpufreq device
> > > > > > for guests to read frequency information or to request frequency
> > > > > > selection. The virtual cpufreq device has an individual controller for
> > > > > > each frequency domain.
> > > > >
> > > > > I would really refrain form having absolute frequencies here. A
> > > > > virtual machine can be migrated, and there are *zero* guarantees that
> > > > > the target system has the same clock range as the source.
> > > > >
> > > > > This really should be a relative number, much like the capacity. That,
> > > > > at least, can be migrated across systems.
> > > >
> > > > There's nothing in this patch that mandates absolute frequency.
> > > > In true KVM philosophy, we leave it to the VMM to decide.
> > >
> > > This has nothing to do with KVM. It would apply to any execution
> > > environment, including QEMU in TCG mode.
> > >
> > > To quote the original patch:
> > >
> > > + description:
> > > + Address and size of region containing frequency controls for each of the
> > > + frequency domains. Regions for each frequency domain is placed
> > > + contiugously and contain registers for controlling DVFS(Dynamic Frequency
> > > + and Voltage) characteristics. The size of the region is proportional to
> > > + total number of frequency domains.
> > >
> > > What part of that indicates that *relative* frequencies are
> > > acceptable? The example explicitly uses the opp-v2 binding, which
> > > clearly is about absolute frequency.
> >
> > We can update the doc to make that clearer and update the example too.
> >
> > > To reiterate: absolute frequencies are not the right tool for the job,
> > > and they should explicitly be described as relative in the spec. Not
> > > left as a "whatev'" option for the execution environment to interpret.
> >
> > I think it depends on the use case. If there's no plan to migrate the
> > VM across different devices, there's no need to do the unnecessary
> > normalization back and forth.
>
> VM migration is a given, specially when QEMU is involved. Designing
> something that doesn't support it is a bug, plain and simple.
I'm not saying this patch series doesn't allow migrating. I'm just
pointing out that normalization might not always be worth it for a VMM
to do.
We'll update the example and documentation to used normalize values.
CPUfreq itself used KHz for the tables and we can't really change that
when we are emulating CPU freq scaling. Same goes for OPP table DT
property names.
But the values we use can be 0 to 1024 Hz and be normalized.
> > And if we can translate between pCPU frequency and a normalized
> > frequency, we can do the same for whatever made up frequencies too. In
> > fact, we plan to do exactly that in our internal use cases for this.
> > There's nothing here that prevents the VMM from doing that.
> >
> > Also, if there are hardware virtualized performance counters (AMU,
> > CPPC, etc) that are used for frequency normalization, then we have to
> > use the real frequencies in those devices otherwise the "current
> > frequency" can be 2 GHz while the max normalized frequency is 1024
> > KHz. That'll mess up load tracking.
>
> And that's exactly why this shouldn't be a *frequency*, but a
> performance scale or some other unit-less coefficient. Just like the
> big-little capacity.
Sorry I wasn't cleared in my explanation. Let me explain it better.
When performance counters are available, the scheduler uses them to
compute the current average CPU frequency over a scheduler tick. It
looks at the performance counters to figure out how many CPU cycles
have gone by and how much time has gone by and does (delta cycles /
delta seconds) to compute current CPU freq in Hz. So, when a HW
virtualized performance counter is available, and the scheduler inside
the VM uses it to compute the average CPU frequency, the resulting
frequency is going to be the real physical CPU frequency. And the CPU
frequency value the scheduler used to compute the PELT load will be
completely different from the normalized values and the load tracking
will be completely wrong.
Using their performance counters inside the VM reduces a lot of MMIO
access exits to the host or memory accesses. So it makes sense a VM
might want to use that. I was just trying to say that in that
situation, a VMM might have a good reason to just use the real
frequencies inside the VM too.
In any case, we can update the doc to use normalized values/examples
and call out this caveat.
Thanks,
Saravana
Powered by blists - more mailing lists