[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ef5f03f-da33-438a-838a-26a99c05fd39@citrix.com>
Date: Fri, 16 Aug 2024 22:45:23 +0100
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: "H. Peter Anvin" <hpa@...or.com>, Xin Li <xin@...or.com>,
linux-kernel@...r.kernel.org
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, peterz@...radead.org,
seanjc@...gle.com
Subject: Re: [PATCH v1 2/3] x86/msr: Switch between WRMSRNS and WRMSR with the
alternatives mechanism
On 16/08/2024 8:18 pm, H. Peter Anvin wrote:
> On August 16, 2024 11:40:05 AM PDT, Andrew Cooper <andrew.cooper3@...rix.com> wrote:
>> On 16/08/2024 6:52 pm, Xin Li wrote:
>>> On 8/9/2024 4:07 PM, Andrew Cooper wrote:
>>>> On 07/08/2024 6:47 am, Xin Li (Intel) wrote:
>>>>> From: Andrew Cooper <andrew.cooper3@...rix.com>
>>>>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>>>>> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
>>>>> +
>>>>> +/* Non-serializing WRMSR, when available. Falls back to a
>>>>> serializing WRMSR. */
>>>>> static __always_inline void wrmsrns(u32 msr, u64 val)
>>>>> {
>>>>> - __wrmsrns(msr, val, val >> 32);
>>>>> + /*
>>>>> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a
>>>>> redundant
>>>>> + * DS prefix to avoid a trailing NOP.
>>>>> + */
>>>>> + asm volatile("1: "
>>>>> + ALTERNATIVE("ds wrmsr",
>>>> This isn't the version I presented, and there's no discussion of the
>>>> alteration.
>>> I'm trying to implement wrmsr() as
>>>
>>> static __always_inline void wrmsr(u32 msr, u64 val)
>>> {
>>> asm volatile("1: " ALTERNATIVE_2("wrmsr", WRMSRNS,
>>> X86_FEATURE_WRMSRNS,
>>> "call asm_xen_write_msr", X86_FEATURE_XENPV)
>>> "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
>>> : : "c" (msr), "a" (val), "d" ((u32)(val >> 32)),
>>> "D" (msr), "S" (val));
>>> }
>>>
>>>
>>> As the CALL instruction is 5-byte long, and we need to pad nop for both
>>> WRMSR and WRMSRNS, what about not using segment prefix at all?
>> The prefix was a minor optimisation to avoid having a trailing nop at
>> the end.
>>
>> When combined with a call, you need 3 prefixes on WRMSR and 2 prefixes
>> on WRMSRNS to make all options be 5 bytes long.
>>
>> That said, there's already a paravirt hook for this, and if you're
>> looking to work around the code gen mess for that, then doing it like
>> this by doubling up into rdi and rsi isn't great either.
>>
>> My suggestion, not that I've had time to experiment, was to change
>> paravirt to use a non-C ABI and have asm_xen_write_msr() recombine
>> edx:eax into rsi. That way the top level wrmsr() retains sensible
>> codegen for native even when paravirt is active.
>>
>> ~Andrew
> Heh, that was my suggestion, except that I suggested defining it so rax pass the full value; the generation of edx still is necessary but there is no real reason to have to recombine them. (One could even add that code to the assembly pattern as the CALL instruction is longer.)
Hmm yeah, having %rax be full is likely how the logic is going to look
before generating %edx.
> My biggest question is how the #GP on an invalid MSR access is handled with Xen. The rest is trivial.
For historical reasons it's a mess.
xen_do_write_msr() does several things.
* Turns FSBASE/GSBASE/GSKERN into their respective hypercalls (although
no error checking at all!)
* Discards modifications to the SYSCALL/SYSENTER MSRs
Otherwise, uses the native accessor, taking the #GP path to Xen and is
emulated (including full decode).
Way back in the day, XenPV's paravirt wrmsr would swallow #GP and
pretend success, and then started depending on this behaviour in order
to boot. But times have moved on, and even the normal accessors are
really safe+warn, and I'm sure all of this could be cleaned up.
For ages I've been wanting to make a single PV_PRIV_OP hypercall so we
can skip the x86 emulator, but I've never had time to do that either.
~Andrew
Powered by blists - more mailing lists