[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2624e84-6fab-44a3-affc-ce0847cd3da4@suse.com>
Date: Tue, 22 Apr 2025 11:57:01 +0200
From: Jürgen Groß <jgross@...e.com>
To: "Xin Li (Intel)" <xin@...or.com>, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, linux-perf-users@...r.kernel.org,
linux-hyperv@...r.kernel.org, virtualization@...ts.linux.dev,
linux-pm@...r.kernel.org, linux-edac@...r.kernel.org,
xen-devel@...ts.xenproject.org, linux-acpi@...r.kernel.org,
linux-hwmon@...r.kernel.org, netdev@...r.kernel.org,
platform-driver-x86@...r.kernel.org
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, acme@...nel.org,
andrew.cooper3@...rix.com, peterz@...radead.org, namhyung@...nel.org,
mark.rutland@....com, alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
irogers@...gle.com, adrian.hunter@...el.com, kan.liang@...ux.intel.com,
wei.liu@...nel.org, ajay.kaher@...adcom.com,
bcm-kernel-feedback-list@...adcom.com, tony.luck@...el.com,
pbonzini@...hat.com, vkuznets@...hat.com, seanjc@...gle.com,
luto@...nel.org, boris.ostrovsky@...cle.com, kys@...rosoft.com,
haiyangz@...rosoft.com, decui@...rosoft.com
Subject: Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism
to write MSR
On 22.04.25 10:22, Xin Li (Intel) wrote:
> The story started from tglx's reply in [1]:
>
> For actual performance relevant code the current PV ops mechanics
> are a horrorshow when the op defaults to the native instruction.
>
> look at wrmsrl():
>
> wrmsrl(msr, val
> wrmsr(msr, (u32)val, (u32)val >> 32))
> paravirt_write_msr(msr, low, high)
> PVOP_VCALL3(cpu.write_msr, msr, low, high)
>
> Which results in
>
> mov $msr, %edi
> mov $val, %rdx
> mov %edx, %esi
> shr $0x20, %rdx
> call native_write_msr
>
> and native_write_msr() does at minimum:
>
> mov %edi,%ecx
> mov %esi,%eax
> wrmsr
> ret
>
> In the worst case 'ret' is going through the return thunk. Not to
> talk about function prologues and whatever.
>
> This becomes even more silly for trivial instructions like STI/CLI
> or in the worst case paravirt_nop().
This is nonsense.
In the non-Xen case the initial indirect call is directly replaced with
STI/CLI via alternative patching, while for Xen it is replaced by a direct
call.
The paravirt_nop() case is handled in alt_replace_call() by replacing the
indirect call with a nop in case the target of the call was paravirt_nop()
(which is in fact no_func()).
>
> The call makes only sense, when the native default is an actual
> function, but for the trivial cases it's a blatant engineering
> trainwreck.
The trivial cases are all handled as stated above: a direct replacement
instruction is placed at the indirect call position.
> Later a consensus was reached to utilize the alternatives mechanism to
> eliminate the indirect call overhead introduced by the pv_ops APIs:
>
> 1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
> disabled feature, preventing the Xen code from being built
> and ensuring the native code is executed unconditionally.
This is the case today already. There is no need for any change to have
this in place.
>
> 2) When built with CONFIG_XEN_PV:
>
> 2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
> the kernel runtime binary is patched to unconditionally
> jump to the native MSR write code.
>
> 2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
> kernel runtime binary is patched to unconditionally jump
> to the Xen MSR write code.
I can't see what is different here compared to today's state.
>
> The alternatives mechanism is also used to choose the new immediate
> form MSR write instruction when it's available.
Yes, this needs to be added.
> Consequently, remove the pv_ops MSR write APIs and the Xen callbacks.
I still don't see a major difference to today's solution.
Only the "paravirt" term has been eliminated.
Juergen
Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3684 bytes)
Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (496 bytes)
Powered by blists - more mailing lists