[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170301151101.GA17141@amt.cnet>
Date: Wed, 1 Mar 2017 12:11:03 -0300
From: Marcelo Tosatti <mtosatti@...hat.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Radim Krcmar <rkrcmar@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Viresh Kumar <viresh.kumar@...aro.org>
Subject: Re: [patch 0/3] KVM CPU frequency change hypercalls
On Wed, Mar 01, 2017 at 03:21:32PM +0100, Paolo Bonzini wrote:
>
>
> On 28/02/2017 03:45, Marcelo Tosatti wrote:
> > On Fri, Feb 24, 2017 at 04:34:52PM +0100, Paolo Bonzini wrote:
> >>
> >>
> >> On 24/02/2017 14:04, Marcelo Tosatti wrote:
> >>>>>>> Whats the current usecase, or forseeable future usecase, for save/restore
> >>>>>>> across preemption again? (which would validate the broken by design
> >>>>>>> claim).
> >>>>>> Stop a guest that is using cpufreq, start a guest that is not using it.
> >>>>>> The second guest's performance now depends on the state that the first
> >>>>>> guest left in cpufreq.
> >>>>> Nothing forbids the host to implement switching with the
> >>>>> current hypercall interface: all you need is a scheduler
> >>>>> hook.
> >>>> Can it be done in vcpu_load/vcpu_put? But you still would have two
> >>>> components (KVM and sysfs) potentially fighting over the frequency, and
> >>>> that's still a bit ugly.
> >>>
> >>> Change the frequency at vcpu_load/vcpu_put? Yes: call into
> >>> cpufreq-userspace. But there is no notion of "per-task frequency" on the
> >>> Linux kernel (which was the starting point of this subthread).
> >>
> >> There isn't, but this patchset is providing a direct path from a task to
> >> cpufreq-userspace. This is as close as you can get to a per-task frequency.
> >
> > Cpufreq-userspace is supposed to be used by tasks in userspace.
> > Thats why its called "userspace".
>
> I think the intended usecase is to have a daemon handling a systemwide
> policy. Examples are the historical (and now obsolete) users such as
> cpufreqd, cpudyn, powernowd, or cpuspeed. The user alternatively can
> play the role of the daemon by writing to sysfs.
>
> I've never seen userspace tasks talking to cpufreq-userspace to set
> their own running frequency. If DPDK does it, that's nasty in my
> opinion
Please extend what "nasty" means in detail. I really don't understand
why its nasty.
> and we should find an interface that works best for both DPDK
> and KVM. Which should be done on linux-pm like Rafael suggested.
>
> >>> But if you configure all CPUs in the system as cpufreq-userspace,
> >>> then some other (userspace program) has to decide the frequency
> >>> for the other CPUs.
> >>>
> >>> Which agent would do that and why? Thats why i initially said "whats the
> >>> usecase".
> >>
> >> You could just pin them at the highest non-TurboBoost frequency until a
> >> guest runs. That's assuming that they are idle and, because of
> >> isol_cpus/nohz_full, they would be almost always in deep C state anyway.
> >
> > The original claim of the thread was: "this feature (frequency
> > hypercalls) works for pinned vcpu<->pcpu, pcpu dedicated exclusively
> > to vcpu case, lets try to extend this to other cases".
> >
> > Which is a valid and useful direction to go.
> >
> > However there is no user for multiple vcpus in the same pcpu now.
>
> You are still ignoring the case of one guest started after another, or
> of another program started on a CPU that formerly was used by KVM. They
> don't have to be multiple users at the same time.
Just have the cpufreq-userspace policy be instantiated while the
isolated vcpu owns the pcpu. Before/after that, the previous policy
is in place.
> > If there were multiple vcpus, all of them requesting a given
> > frequency, it would be necessary to:
> >
> > 1) Maintain frequency of the pcpu to the highest
> > frequencies.
> >
> > OR
> >
> > 2) Since switching frequencies can take up to 70us (*)
> > (depends on processor), its generally not worthwhile
> > to switch frequencies between task switches.
>
> Is latency that important, or is rather overhead the one to pay
> attention to? The slides you linked
> (http://www.ena-hpc.org/2013/pdf/04.pdf) at page 17 suggest it's around
> 10us.
Ok, be it 10us. 10us overhead on every task context switch is not
acceptable.
> One possibility is to do (1) if you have multiple tasks on the run queue
> (or fallback to what is specified in sysfs) and (2) if you only have one
> task.
Sure, that is alright. But the use-case at hand does not involve
multiple tasks on the pcpu.
> Anyway, please repost with Cc to linux-pm so that we can restart the
> discussion there.
>
> Paolo
Done. Can you please reply with a concise summary of what you object to?
Powered by blists - more mailing lists