linux-kernel - Re: [patch 0/3] KVM CPU frequency change hypercalls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20170301151101.GA17141@amt.cnet>
Date:   Wed, 1 Mar 2017 12:11:03 -0300
From:   Marcelo Tosatti <mtosatti@...hat.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Radim Krcmar <rkrcmar@...hat.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Viresh Kumar <viresh.kumar@...aro.org>
Subject: Re: [patch 0/3] KVM CPU frequency change hypercalls

On Wed, Mar 01, 2017 at 03:21:32PM +0100, Paolo Bonzini wrote:
> 
> 
> On 28/02/2017 03:45, Marcelo Tosatti wrote:
> > On Fri, Feb 24, 2017 at 04:34:52PM +0100, Paolo Bonzini wrote:
> >>
> >>
> >> On 24/02/2017 14:04, Marcelo Tosatti wrote:
> >>>>>>> Whats the current usecase, or forseeable future usecase, for save/restore
> >>>>>>> across preemption again? (which would validate the broken by design
> >>>>>>> claim).
> >>>>>> Stop a guest that is using cpufreq, start a guest that is not using it.
> >>>>>> The second guest's performance now depends on the state that the first
> >>>>>> guest left in cpufreq.
> >>>>> Nothing forbids the host to implement switching with the
> >>>>> current hypercall interface: all you need is a scheduler
> >>>>> hook.
> >>>> Can it be done in vcpu_load/vcpu_put?  But you still would have two
> >>>> components (KVM and sysfs) potentially fighting over the frequency, and
> >>>> that's still a bit ugly.
> >>>
> >>> Change the frequency at vcpu_load/vcpu_put? Yes: call into
> >>> cpufreq-userspace. But there is no notion of "per-task frequency" on the
> >>> Linux kernel (which was the starting point of this subthread).
> >>
> >> There isn't, but this patchset is providing a direct path from a task to
> >> cpufreq-userspace.  This is as close as you can get to a per-task frequency.
> > 
> > Cpufreq-userspace is supposed to be used by tasks in userspace.
> > Thats why its called "userspace".
> 
> I think the intended usecase is to have a daemon handling a systemwide
> policy.  Examples are the historical (and now obsolete) users such as
> cpufreqd, cpudyn, powernowd, or cpuspeed.  The user alternatively can
> play the role of the daemon by writing to sysfs.
> 
> I've never seen userspace tasks talking to cpufreq-userspace to set
> their own running frequency.  If DPDK does it, that's nasty in my
> opinion

Please extend what "nasty" means in detail. I really don't understand
why its nasty.

>  and we should find an interface that works best for both DPDK
> and KVM.  Which should be done on linux-pm like Rafael suggested.
> 
> >>> But if you configure all CPUs in the system as cpufreq-userspace,
> >>> then some other (userspace program) has to decide the frequency
> >>> for the other CPUs.
> >>>
> >>> Which agent would do that and why? Thats why i initially said "whats the
> >>> usecase".
> >>
> >> You could just pin them at the highest non-TurboBoost frequency until a
> >> guest runs.  That's assuming that they are idle and, because of
> >> isol_cpus/nohz_full, they would be almost always in deep C state anyway.
> > 
> > The original claim of the thread  was: "this feature (frequency
> > hypercalls) works for pinned vcpu<->pcpu, pcpu dedicated exclusively
> > to vcpu case, lets try to extend this to other cases".
> > 
> > Which is a valid and useful direction to go.
> > 
> > However there is no user for multiple vcpus in the same pcpu now.
> 
> You are still ignoring the case of one guest started after another, or
> of another program started on a CPU that formerly was used by KVM.  They
> don't have to be multiple users at the same time.

Just have the cpufreq-userspace policy be instantiated while the 
isolated vcpu owns the pcpu. Before/after that, the previous policy 
is in place. 

> > If there were multiple vcpus, all of them requesting a given
> > frequency, it would be necessary to:
> > 
> > 	1) Maintain frequency of the pcpu to the highest 
> > 	   frequencies.
> > 
> > 		OR
> > 
> > 	2) Since switching frequencies can take up to 70us (*)
> > 	   (depends on processor), its generally not worthwhile
> > 	   to switch frequencies between task switches.
> 
> Is latency that important, or is rather overhead the one to pay
> attention to?  The slides you linked
> (http://www.ena-hpc.org/2013/pdf/04.pdf) at page 17 suggest it's around
> 10us.

Ok, be it 10us. 10us overhead on every task context switch is not
acceptable.

> One possibility is to do (1) if you have multiple tasks on the run queue
> (or fallback to what is specified in sysfs) and (2) if you only have one
> task.

Sure, that is alright. But the use-case at hand does not involve 
multiple tasks on the pcpu.

> Anyway, please repost with Cc to linux-pm so that we can restart the
> discussion there.
> 
> Paolo

Done. Can you please reply with a concise summary of what you object to?