lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zdb68uUMUdQS2Scw@intel.com>
Date: Thu, 22 Feb 2024 15:42:42 +0800
From: Zhao Liu <zhao1.liu@...ux.intel.com>
To: Paolo Bonzini <pbonzini@...hat.com>,
	Sean Christopherson <seanjc@...gle.com>
Cc: "Rafael J . Wysocki" <rafael@...nel.org>,
	Daniel Lezcano <daniel.lezcano@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H . Peter Anvin" <hpa@...or.com>, kvm@...r.kernel.org,
	linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
	x86@...nel.org,
	Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
	Len Brown <len.brown@...el.com>, Zhang Rui <rui.zhang@...el.com>,
	Zhenyu Wang <zhenyu.z.wang@...el.com>,
	Zhuocheng Ding <zhuocheng.ding@...el.com>,
	Dapeng Mi <dapeng1.mi@...el.com>,
	Yanting Jiang <yanting.jiang@...el.com>,
	Yongwei Ma <yongwei.ma@...el.com>,
	Vineeth Pillai <vineeth@...byteword.org>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Masami Hiramatsu <mhiramat@...gle.com>,
	David Dai <davidai@...gle.com>,
	Saravana Kannan <saravanak@...gle.com>,
	Zhao Liu <zhao1.liu@...el.com>
Subject: Re: [RFC 00/26] Intel Thread Director Virtualization

Ping Paolo & Sean,

Do you have any comment? Or do you think ITD virtualization is
appropriate to discuss at PUCK?

Thanks,
Zhao

On Sat, Feb 03, 2024 at 05:11:48PM +0800, Zhao Liu wrote:
> Date: Sat, 3 Feb 2024 17:11:48 +0800
> From: Zhao Liu <zhao1.liu@...ux.intel.com>
> Subject: [RFC 00/26] Intel Thread Director Virtualization
> X-Mailer: git-send-email 2.34.1
> 
> From: Zhao Liu <zhao1.liu@...el.com>
> 
> Hi list,
> 
> This is our RFC to virtualize Intel Thread Director (ITD) feature for
> Guest, which is based on Ricardo's patch series about ITD related
> support in HFI driver ("[PATCH 0/9] thermal: intel: hfi: Prework for the
> virtualization of HFI" [1]).
> 
> In short, the purpose of this patch set is to enable the ITD-based
> scheduling logic in Guest so that Guest can better schedule Guest tasks
> on Intel hybrid platforms.
> 
> Currently, ITD is necessary for Windows VMs. Based on ITD virtualization
> support, the Windows 11 Guest could have significant performance
> improvement (for example, on i9-13900K, up to 14%+ improvement on
> 3DMARK).
> 
> Our ITD virtualization is not bound to VMs' hybrid topology or vCPUs'
> CPU affinity. However, in our practice, the ITD scheduling optimization
> for win11 VMs works best when combined with hybrid topology and CPU
> affinity (this is related to the specific implementation of Win11
> scheduling). For more details, please see the Section.1.2 "About hybrid
> topology and vCPU pinning".
> 
> To enable ITD related scheduling optimization in Win11 VM, some other
> thermal related support is also needed (HWP, CPPC), but we could emulate
> it with dummy value in the VMM (We'll also be sending out extra patches
> in the future for these).
> 
> Welcome your feedback!
> 
> 
> 1. Background and Motivation
> ============================
> 
> 1.1. Background
> ^^^^^^^^^^^^^^^
> 
> We have the use case to run games in the client Windows VM as the cloud
> gaming solution.
> 
> Gaming VMs are performance-sensitive VMs on Client, so that they usually
> have two characteristics to ensure interactivity and performance:
> 
> i) There will be vCPUs equal to or close to the number of Host pCPUs.
> 
> ii) The vCPUs of Gaming VM are often bound to the pCPUs to achieve
> exclusive resources and avoid the overhead of migration.
> 
> In this case, Host can't provide effective scheduling for Guest, so we
> need to deliver more hardware-assisted scheduling capabilities to Guest
> to enhance Guest's scheduling.
> 
> Windows 11 (and future Windows products) is heavily optimized for the
> Intel hybrid platform. To get the best performance, we need to
> virtualize hybrid scheduling features (HFI/ITD) for Windows Guest.
> 
> 
> 1.2. About hybrid topology and vCPU pinning
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Our ITD virtualization can support most vCPU topologies (except multiple
> packages/dies, see details in 3.5 Restrictions on Guest Topology), and
> can also support the case of non-pinning vCPUs (i.e. it can handle vCPU
> thread migration).
> 
> The following is our performance measuremnt on an i9-13900K machine
> (2995Mhz, 24Cores, 32Thread(8+16) RAM: 14GB (16GB Physical)), with
> iGPU passthrough, running 3DMARK in Win11 Professional Guest:
> 
> 
> compared with smp topo case       smp topo        smp topo        smp topo      hybrid topo       hybrid topo     hybrid topo     hybrid topo
>                                 + affinity      + ITD           + ITD                           + affinity      + ITD           + ITD
>                                                                 + affinity                                                      + affinity
> Time Spy - Overall                0.179%        -0.250%           0.179%        -0.107%           0.143%        -0.179%         -0.107%
> Graphics score                    0.124%        -0.249%           0.124%        -0.083%           0.124%        -0.166%         -0.249%
> CPU score                         0.916%        -0.485%           1.149%        -0.076%           0.722%        -0.324%         11.915%
> Fire Strike Extreme - Overall     0.149%         0.000%           0.224%        -1.021%          -3.361%        -1.319%         -3.361%
> Graphics score                    0.100%         0.050%           0.150%        -1.376%          -3.427%        -1.676%         -3.652%
> Physics score                     5.060%         0.759%           0.518%        -2.907%         -10.914%        -0.897%         14.638%
> Combined  score                   0.120%        -0.179%           0.418%         0.060%          -2.929%        -0.179%         -2.809%
> Fire Strike - Overall             0.350%        -0.085%           0.193%        -1.377%          -1.365%        -1.509%         -1.787%
> Graphics score                    0.256%        -0.047%           0.210%        -1.527%          -1.376%        -1.504%         -2.320%
> Physics score                     3.695%        -2.180%           0.629%        -1.581%          -6.846%        -1.444%         14.100%
> Combined  score                   0.415%        -0.128%           0.128%        -0.957%          -1.052%        -1.594%         -0.957%
> CPU Profile Max Threads           1.836%         0.298%           1.786%        -0.069%           1.545%         0.025%          9.472%
> 16 Threads                        4.290%         0.989%           3.588%         0.595%           1.580%         0.848%         11.295%
> 8 Threads                       -22.632%        -0.602%         -23.167%        -0.988%          -1.345%        -1.340%          8.648%
> 4 Threads                       -21.598%         0.449%         -21.429%        -0.817%           1.951%        -0.832%          2.084%
> 2 Threads                       -12.912%        -0.014%         -12.006%        -0.481%          -0.609%        -0.595%          1.161%
> 1 Threads                        -3.793%        -0.137%          -3.793%        -0.495%          -3.189%        -0.495%          1.154%
> 
> 
> Based on the above result, we can find exposing only HFI/ITD to win11
> VMs without hybrid topology or CPU affinity (case "smp topo + ITD")
> won't hurt performance, but would also not get any performance
> improvement.
> 
> Setting both hybrid topology and CPU affinity for ITD, then win11 VMs
> get significate performance improvement (up to 14%+, compared with the
> case setting smp topology without CPU affinity).
> 
> Not only the numerical results of 3DMARK, but in practice, there is an
> significate improvement in the frame rate of the games.
> 
> Also, the more powerful the machine, the more significate the
> performance gains!
> 
> Therefore, the best practice for enabling ITD scheduling optimization
> is to set up both CPU affinity and hybrid topology for win11 Guest while
> enabling our ITD virtualization.
> 
> Our earlier QEMU prototype RFC [2] presented the initial hybrid
> topology support for VMs. And currently our another proposal about
> "QOM topology" [3] has been raised in the QEMU community, which is the
> first step towards the hybrid topology implementation based on QOM
> approach.
> 
> 
> 2. Introduction of HFI and ITD
> ==============================
> 
> Intel provides Hardware Feedback Interface (HFI) feature to allow
> hardware to provide guidance to the OS scheduler to perform optimal
> workload scheduling through a hardware feedback interface structure in
> memory [4]. This HFI structure is called HFI table.
> 
> For now, the guidance includes performance and energy efficiency
> hints, and it could be update via thermal interrupt as the actual
> operating conditions of the processor change during run time.
> 
> Intel Thread Director (ITD) feature extends the HFI to provide
> performance and energy efficiency data for advanced classes of
> instructions.
> 
> Since ITD is an extension of HFI, our ITD virtualization also
> virtualizes the native HFI feature.
> 
> 
> 3. Dependencies of ITD
> ======================
> 
> ITD is a thermal FEATURE that requires:
> * PTM (Package Thermal Management, alias, PTS)
> * HFI (Hardware Feedback Interface)
> 
> In order to support the notification mechanism of ITD/HFI dynamic
> update, we also need to add thermal interrupt related support,
> including the following two features:
> * ACPI (Thermal Monitor and Software Controlled Clock Facilities)
> * TM (Thermal Monitor, alias, TM1/ACC)
> 
> Therefore, we must also consider support for the emulation of all
> the above dependencies.
> 
> 
> 3.1. ACPI emulation
> ^^^^^^^^^^^^^^^^^^^
> 
> For both ACPI, we can support it by emulating the RDMSR/WRMSR of the
> associated MSRs and adding the ability to inject thermal interrupts.
> But in fact, we don't really inject termal interrupts into Guest for
> the termal conditions corresponding to ACPI. Here the termal interrupt
> is prepared for the subsequent HFI/ITD.
> 
> 
> 3.2. TM emulation
> ^^^^^^^^^^^^^^^^^
> 
> TM is a hardware feature and its CPUID bit only indicates the presence
> of the automatic thermal monitoring facilities. For TM, there's no
> interactive interface between OS and hardware, but its flag is one of
> the prerequisites for the OS to enable thermal interrupt.
> 
> Thereby, as the support for TM, it is enough for us to expose its CPUID
> flag to Guest.
> 
> 
> 3.3. PTM emulation
> ^^^^^^^^^^^^^^^^^^
> 
> PTM is a package-scope feature that includes package-level MSR and
> package-level thermal interrupt. Unfortunately, KVM currently only
> supports thread-scope MSR handling, and also doesn't care about the
> specific Guest's topology.
> 
> But considering that our purpose of supporting PTM in KVM is to further
> support ITD, and the current platforms with ITD are all 1 package, so we
> emulate the MSRs of the package scope provided by PTM at the VM level.
> 
> In this way, the VMM is required to set only one package topology for
> the PTM. In order to alleviate this limitation, we only expose the PTM
> feature bit to Guest when ITD needs to be supported.
> 
> 
> 3.4. HFI emulation
> ^^^^^^^^^^^^^^^^^^
> 
> ITD is the extension of HFI, so both HFI and ITD depend on HFI table.
> HFI itself is used on the Host for power-related management control, so
> we should only expose HFI to Guest when we need to enable ITD.
> 
> HFI also relies on PTM interrupt control, so it also has requirements
> for package topology, and we also emulate HFI (including ITD) at the VM
> level.
> 
> In addition, because the HFI driver allocates HFI instances per die,
> this also affects HFI (and ITD) and must limit the Guest to only set one
> die.
> 
> 
> 3.5. Restrictions on Guest Topology
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Due to KVM's incomplete support for MSR topology and the requirement for
> HFI instance management in the kernel, PTM, HFI, and ITD limit the
> topology of the Guest (mainly restricting the topology types created on
> the VMM side).
> 
> Therefore, we only expose PTM, HFI, and ITD to userspace when we need to
> support ITD. At the same time, considering that currently, ITD is only
> used on the client platform with 1 package and 1 die, such temporary
> restrictions will not have too much impact.
> 
> 
> 4. Overview of ITD (and HFI) virtualization
> ===========================================
> 
> The main tasks of ITD (including HFI) virtualization are:
> * maintain a virtual HFI table for VM.
> * inject thermal interrupt when HFI table updates.
> * handle related MSRs' emulation and adjust HFI table based on MSR's
>   control bits.
> * expose ITD/HFI configuration info in related CPUID leaves.
> 
> The most important of these is the maintenance of the virtual HFI table.
> Although the HFI table should also be per package, since ITD/HFI related
> MSRs are treated as per VM in KVM, we also treat the virtual HFI table
> as per VM.
> 
> 
> 4.1. HFI table building
> ^^^^^^^^^^^^^^^^^^^^^^^
> 
> HFI table contains a table header and many table entries. Each table
> entry is identified by an hfi table index, and each CPU corresponds to
> one of the hfi table indexes.
> 
> ITD and HFI features both depend on the HFI table, but their HFI table
> are a little different. The HFI table provided by the ITD feature has
> more classes (in terms of more columns in the table) than the HFI table
> of native HFI feature.
> 
> The virtual HFI table in KVM is built based on the actual HFI table,
> which is maintained by HFI instance in HFI driver. We extract the HFI
> data of the pCPUs, which vCPUs are running on, to form a virtual HFI
> table.
> 
> 
> 4.2. HFI table index
> ^^^^^^^^^^^^^^^^^^^^
> 
> There are many entries in the HFI table, and the vCPU will be assigned
> an HFI table index to specify the entry it maps. KVM will fill the
> pCPU's HFI data (the pCPU that vCPU is running on) into the entry
> corresponding to the HFI table index of the vCPU in the vcitual HFI
> table.
> 
> This index is set by VMM in CPUID.
> 
> 
> 4.3. HFI table updating
> ^^^^^^^^^^^^^^^^^^^^^^^
> 
> On some platforms, the HFI table will be dynamically updated with
> thermal interrupts. In order to update the virtual HFI table in time, we
> added the per-VM notifier to the HFI driver to notify KVM to update the
> virtual HFI table for the VM, and then inject thermal interrupt into the
> VM to notify the Guest.
> 
> There is another case that needs to update the virtual HFI table, that
> is, when the vCPU is migrated, the pCPU where it is located is changed,
> and the corresponding virtual HFI data should also be updated to the new
> pCPU's data. In this case, in order to reduce overhead, we can only
> update the data of a single vPCU without traversing the entire virtual
> HFI table.
> 
> 
> 5. Patch Summary
> ================
> 
> Patch 01-03: Prepare the bit definition, the hfi helpers and hfi data
>              structures that KVM needs.
> Patch 04-05: Add the sched_out arch hook and reset the classification
>              history at sched_in()/schedu_out().
> Patch 06-10: Add emulations of ACPI, TM and PTM, mainly about CPUID and
>              related MSRs.
> Patch 11-20: Add the emulation support for HFI, including maintaining
>              the HFI table for VM.
> Patch 21-23: Add the emulation support for ITD, including extending HFI
>              to ITD and passing through the classification MSRs.
> Patch 24-25: Add HRESET emulation support, which is also used by IPC
>              classes feature.
> Patch 26:    Add the brief doc about the per-VM lock - pkg_therm_lock.
> 
> 
> 6. References
> =============
> 
> [1]: [PATCH 0/9] thermal: intel: hfi: Prework for the virtualization of HFI
>      https://lore.kernel.org/lkml/20240203040515.23947-1-ricardo.neri-calderon@linux.intel.com/
> [2]: [RFC 00/52] Introduce hybrid CPU topology,
>      https://lore.kernel.org/qemu-devel/20230213095035.158240-1-zhao1.liu@linux.intel.com/
> [3]: [RFC 00/41] qom-topo: Abstract Everything about CPU Topology,
>      https://lore.kernel.org/qemu-devel/20231130144203.2307629-1-zhao1.liu@linux.intel.com/
> [4]: SDM, vol. 3B, section 15.6 HARDWARE FEEDBACK INTERFACE AND INTEL
>      THREAD DIRECTOR
> 
> 
> Thanks and Best Regards,
> Zhao
> ---
> Zhao Liu (17):
>   thermal: Add bit definition for x86 thermal related MSRs
>   KVM: Add kvm_arch_sched_out() hook
>   KVM: x86: Reset hardware history at vCPU's sched_in/out
>   KVM: VMX: Add helpers to handle the writes to MSR's R/O and R/WC0 bits
>   KVM: x86: cpuid: Define CPUID 0x06.eax by kvm_cpu_cap_mask()
>   KVM: VMX: Introduce HFI description structure
>   KVM: VMX: Introduce HFI table index for vCPU
>   KVM: x86: Introduce the HFI dynamic update request and kvm_x86_ops
>   KVM: VMX: Allow to inject thermal interrupt without HFI update
>   KVM: VMX: Emulate HFI related bits in package thermal MSRs
>   KVM: VMX: Emulate the MSRs of HFI feature
>   KVM: x86: Expose HFI feature bit and HFI info in CPUID
>   KVM: VMX: Extend HFI table and MSR emulation to support ITD
>   KVM: VMX: Pass through ITD classification related MSRs to Guest
>   KVM: x86: Expose ITD feature bit and related info in CPUID
>   KVM: VMX: Emulate the MSR of HRESET feature
>   Documentation: KVM: Add description of pkg_therm_lock
> 
> Zhuocheng Ding (9):
>   thermal: intel: hfi: Add helpers to build HFI/ITD structures
>   thermal: intel: hfi: Add HFI notifier helpers to notify HFI update
>   KVM: VMX: Emulate ACPI (CPUID.0x01.edx[bit 22]) feature
>   KVM: x86: Expose TM/ACC (CPUID.0x01.edx[bit 29]) feature bit to VM
>   KVM: VMX: Emulate PTM/PTS (CPUID.0x06.eax[bit 6]) feature
>   KVM: VMX: Support virtual HFI table for VM
>   KVM: VMX: Sync update of Host HFI table to Guest
>   KVM: VMX: Update HFI table when vCPU migrates
>   KVM: x86: Expose HRESET feature's CPUID to Guest
> 
>  Documentation/virt/kvm/locking.rst  |  13 +-
>  arch/arm64/include/asm/kvm_host.h   |   1 +
>  arch/mips/include/asm/kvm_host.h    |   1 +
>  arch/powerpc/include/asm/kvm_host.h |   1 +
>  arch/riscv/include/asm/kvm_host.h   |   1 +
>  arch/s390/include/asm/kvm_host.h    |   1 +
>  arch/x86/include/asm/hfi.h          |  28 ++
>  arch/x86/include/asm/kvm-x86-ops.h  |   3 +-
>  arch/x86/include/asm/kvm_host.h     |   2 +
>  arch/x86/include/asm/msr-index.h    |  54 +-
>  arch/x86/kvm/cpuid.c                | 201 +++++++-
>  arch/x86/kvm/irq.h                  |   1 +
>  arch/x86/kvm/lapic.c                |   9 +
>  arch/x86/kvm/svm/svm.c              |   8 +
>  arch/x86/kvm/vmx/vmx.c              | 751 +++++++++++++++++++++++++++-
>  arch/x86/kvm/vmx/vmx.h              |  79 ++-
>  arch/x86/kvm/x86.c                  |  18 +
>  drivers/thermal/intel/intel_hfi.c   | 212 +++++++-
>  drivers/thermal/intel/therm_throt.c |   1 -
>  include/linux/kvm_host.h            |   1 +
>  virt/kvm/kvm_main.c                 |   1 +
>  21 files changed, 1343 insertions(+), 44 deletions(-)
> 
> -- 
> 2.34.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ