linux-kernel - Re: [PATCH v4 0/2] Improve VM CPUfreq and task placement behavior

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <438c96fd-bcb0-4699-b81b-40f800cedca0@arm.com>
Date:   Mon, 13 Nov 2023 12:20:29 +0000
From:   Hongyan Xia <hongyan.xia2@....com>
To:     David Dai <davidai@...gle.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Rob Herring <robh+dt@...nel.org>,
        Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
        Conor Dooley <conor+dt@...nel.org>,
        Sudeep Holla <sudeep.holla@....com>,
        Saravana Kannan <saravanak@...gle.com>
Cc:     Quentin Perret <qperret@...gle.com>,
        Masami Hiramatsu <mhiramat@...gle.com>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Marc Zyngier <maz@...nel.org>,
        Oliver Upton <oliver.upton@...ux.dev>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Pavan Kondeti <quic_pkondeti@...cinc.com>,
        Gupta Pankaj <pankaj.gupta@....com>,
        Mel Gorman <mgorman@...e.de>, kernel-team@...roid.com,
        linux-pm@...r.kernel.org, devicetree@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/2] Improve VM CPUfreq and task placement behavior

Hi David,

On 11/11/2023 01:49, David Dai wrote:
> Hi,
> 
> This patch series is a continuation of the talk Saravana gave at LPC 2022
> titled "CPUfreq/sched and VM guest workload problems" [1][2][3]. The gist
> of the talk is that workloads running in a guest VM get terrible task
> placement and CPUfreq behavior when compared to running the same workload
> in the host. Effectively, no EAS(Energy Aware Scheduling) for threads
> inside VMs. This would make power and performance terrible just by running
> the workload in a VM even if we assume there is zero virtualization
> overhead.
> 
> With this series, a workload running in a VM gets the same task placement
> and CPUfreq behavior as it would when running in the host.
> 
> The idea is to improve VM CPUfreq/sched behavior by:
> - Having guest kernel do accurate load tracking by taking host CPU
>    arch/type and frequency into account.
> - Sharing vCPU frequency requirements with the host so that the
>    host can do proper frequency scaling and task placement on the host side.
> 
> Based on feedback from RFC v1 proposal[4], we've revised our
> implementation to using MMIO reads and writes to pass information
> from/to host instead of using hypercalls. In our example, the
> VMM(Virtual Machine Manager) translates the frequency requests into
> Uclamp_min and applies it to the vCPU thread as a hint to the host
> kernel.

Sorry for not noticing this series until now.

The problem you are having with uclamp is actually the same as what
I'm tackling right now. Basically my conclusion so far is that uclamp
max aggregation faces quite many problems, which can be easily solved by
sum aggregation (summing up the clamped utilization values instead of
applying the max uclamp value to the whole rq):

https://lore.kernel.org/all/cover.1696345700.git.Hongyan.Xia2@arm.com/

What you described as util_guest sounds to me as exactly what uclamp_min
under sum aggregation does. I'm really tempted to ask you to apply my
series and see if the new uclamp_min does what you want, instead of
introducing a new util_guest signal. If you have no time for this I can
try to replicate your setup and do the experiments myself.

Also, my knowledge with KVM is limited. May I know where the vCPU fork
happens? Can't you just set the p->sched_reset_on_fork flag on fork to
not carry forward the uclamp values?

> 
> [...]
Hongyan