[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2365af70-d36f-4663-b819-59d886936ef5@zytor.com>
Date: Tue, 13 May 2025 00:44:02 -0700
From: Xin Li <xin@...or.com>
To: Jürgen Groß <jgross@...e.com>,
linux-kernel@...r.kernel.org, x86@...nel.org,
virtualization@...ts.linux.dev
Cc: Ajay Kaher <ajay.kaher@...adcom.com>,
Broadcom internal kernel review list
<bcm-kernel-feedback-list@...adcom.com>,
Thomas Gleixner
<tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
xen-devel@...ts.xenproject.org,
Andrew Cooper <andrew.cooper3@...rix.com>
Subject: Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to
instruction interfaces
On 5/12/2025 4:20 AM, Jürgen Groß wrote:
> On 09.05.25 10:18, Xin Li wrote:
>> On 5/6/2025 2:20 AM, Juergen Gross wrote:
>> I'm trying to evaluate how to add the immediate form MSR instructions
>> on top of this patch set. And I'm close to get it done.
>
> There is something to consider when running as a Xen PV guest, ...
Andrew said he doens't plan to expose WRMSRNS to PV guests, and doesn't
expect MSR_IMM to be useful in a PV guest either, which I fully agree.
>>>
>>> Note that in the Xen PV case the RDMSR/WRMSR patching must not happen
>>> even as an intermediate step, as this would clobber the indirect call
>>> information needed when patching in the direct call for the Xen case.
>>
>> Good point!
>
> ... as this still needs to be true.
>
> There are 2 different ways to deal with this:
>
> 1. When running as a Xen PV guest disable X86_FEATURE_WRMSRNS and
> ASM_WRMSRNS_IMM (e.g. in xen_init_capabilities()).
>
> 2. Buffer the original instruction before patching in apply_alternatives()
> in order to avoid the sequence limitation above (see attached patch).
>
>> Deciding whether to retain the pvops MSR API is the responsibility of
>> the x86 maintainers, who are the ones experiencing the challenges of
>> maintaining the code.
>
> Well, I'm the PV ops maintainer, so it is basically me who needs to deal
> with this. OTOH I do understand that diagnosis of problems with PV ops is
> more complicated than without.
Indeed, significant improvements continue to be implemented.
>
>>
>> tglx said @https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/:
>>
>> > I fundamentaly hate adding this to the PV infrastructure. We don't
>> > want more PV ops, quite the contrary.
>>
>> That is the reason I took a different direction, i.e., removing the
>> pvops MSR APIs. But if your approach is cleaner, they may prefer it.
>
> In the end it isn't adding additional PV ops interfaces. It is modifying
> existing ones.
>
>>
>>> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/
>>> paravirt.h
>>> index a463c747c780..df10b0e4f7b8 100644
>>> --- a/arch/x86/include/asm/paravirt.h
>>> +++ b/arch/x86/include/asm/paravirt.h
>>> @@ -175,24 +175,72 @@ static inline void __write_cr4(unsigned long x)
>>> PVOP_VCALL1(cpu.write_cr4, x);
>>> }
>>> -static inline u64 paravirt_read_msr(u32 msr)
>>> +static __always_inline u64 paravirt_read_msr(u32 msr)
>>> {
>>> - return PVOP_CALL1(u64, cpu.read_msr, msr);
>>> + EAX_EDX_DECLARE_ARGS(val, low, high);
>>
>> This is under CONFIG_PARAVIRT_XXL, thus CONFIG_XEN_PV and CONFIG_X86_64,
>> therefore we don't need to consider 32-bit at all, no?
>
> Right. OTOH the macros are there, so why not use them?
>
> In the end I'm fine to open code the 64-bit case here.
>
Here is a patch I cooked. I added an ALTERNATIVE() hack because the new
instructions can't be more than 6 bytes long. But with the patch you
just sent, it shouldn't be needed.
diff --git a/arch/x86/include/asm/paravirt.h
b/arch/x86/include/asm/paravirt.h
index df10b0e4f7b8..82ffc11d6f1f 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -177,18 +177,20 @@ static inline void __write_cr4(unsigned long x)
static __always_inline u64 paravirt_read_msr(u32 msr)
{
- EAX_EDX_DECLARE_ARGS(val, low, high);
+ u64 val;
PVOP_TEST_NULL(cpu.read_msr);
asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
"rdmsr", ALT_NOT_XEN,
ALT_CALL_INSTR, ALT_XENPV_CALL)
+ ALTERNATIVE("", "shl $0x20, %%rdx; or %%rdx, %%rax", ALT_NOT_XEN)
"2:\n"
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
- : EAX_EDX_RET(val, low, high), ASM_CALL_CONSTRAINT
- : paravirt_ptr(cpu.read_msr), "c" (msr));
+ : "=a" (val), ASM_CALL_CONSTRAINT
+ : paravirt_ptr(cpu.read_msr), "c" (msr)
+ : "rdx");
- return EAX_EDX_VAL(val, low, high);
+ return val;
}
static __always_inline void paravirt_write_msr(u32 msr, u64 val)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index ea3d7d583254..cacd9c37c3bd 100644
@@ -1204,20 +1206,20 @@ __visible u64 xen_read_msr(u32 msr)
return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
}
+
#define PV_PROLOGUE_MSR_xen_read_msr "mov %ecx, %edi;"
-#define PV_EPILOGUE_MSR_xen_read_msr \
- "mov %rax, %rdx; mov %eax, %eax; shr $0x20, %rdx;"
+#define PV_EPILOGUE_MSR_xen_read_msr
PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_read_msr);
-__visible void xen_write_msr(u32 msr, u32 low, u32 high)
+__visible void xen_write_msr(u32 msr, u64 val)
{
int err;
- xen_do_write_msr(msr, (u64)high << 32 | low,
- xen_msr_safe ? &err : NULL);
+ xen_do_write_msr(msr, val, xen_msr_safe ? &err : NULL);
}
+
#define PV_PROLOGUE_MSR_xen_write_msr \
- "mov %ecx, %edi; mov %eax, %esi;"
+ "mov %ecx, %edi; mov %rax, %rsi;"
#define PV_EPILOGUE_MSR_xen_write_msr
PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_write_msr);
>>> +__visible int xen_write_msr_safe(u32 msr, u32 low, u32 high)
>>
>> I think we can avoid splitting this u64 into two u32.
>
> This is related to the native WRMSR interface. The WRMSR needs to be
> able to be replaced by the call of the Xen specific function.
>
> I could handle this in the prologue helpers, but I'd prefer to keep
> those helpers as small as possible.
The above patch makes PV_EPILOGUE_MSR_xen_read_msr empty, because only
RDMSR needs to convert edx:eax into a 64-bit register, and the code is
added into paravirt_read_msr() already.
For xen_write_msr(), the change is simple enough.
Thanks!
Xin
Powered by blists - more mailing lists