linux-kernel - Re: [PATCH v1 2/3] x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0ec48b84-d158-47c6-b14c-3563fd14bcc4@zytor.com>
Date: Fri, 16 Aug 2024 15:59:03 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Andrew Cooper <andrew.cooper3@...rix.com>, Xin Li <xin@...or.com>,
        linux-kernel@...r.kernel.org
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, x86@...nel.org, peterz@...radead.org,
        seanjc@...gle.com
Subject: Re: [PATCH v1 2/3] x86/msr: Switch between WRMSRNS and WRMSR with the
 alternatives mechanism

On 8/16/24 14:26, H. Peter Anvin wrote:
> On 8/16/24 11:40, Andrew Cooper wrote:
>>>
>>> As the CALL instruction is 5-byte long, and we need to pad nop for both
>>> WRMSR and WRMSRNS, what about not using segment prefix at all?
>>
> 
> You can use up to 4 prefixes of any kind (which includes opcode prefixes 
> before 0F) before most decoders start hurting, so we can pad it out to 5 
> bytes by doing 3f 3f .. .. ..
> 
>>
>> My suggestion, not that I've had time to experiment, was to change
>> paravirt to use a non-C ABI and have asm_xen_write_msr() recombine
>> edx:eax into rsi.  That way the top level wrmsr() retains sensible
>> codegen for native even when paravirt is active.
>>
> 
> I have attached what should be an "obvious" example... famous last words.
> 

After looking at what xen_do_write_msr looks like, I realized we can do 
*much* better than that. This means using two alternative sequences, one 
for native/Xen and the other for native wrmsr/wrmsrns.

The call/jz sequence is *exactly* the same length as mov, shr, which 
means that xen_do_write_msr can simply return a flag if it wants the MSR 
access to be performed natively.

An important point is that in no code path can the error code be set to 
-EIO except by native MSR invocation, so the only possibilities are 
"done successfully" or "execute natively." [That being said, that would 
not be *that* hard to fix if needed.]

The new C prototype for xen_do_write_msr() becomes:

bool xen_do_write_msr(uint32_t msr, uint64_t val)

... with a true return meaning "execute natively."

(I changed the order of the arguments from my last version since it is 
more consistent with what is used everywhere else.)

RDMSR is a bit trickier. I think the best option there is to set up a 
new permission trap handler type that amounts to "get the address from 
the stack, apply a specific offset, and invoke the fixup handler for 
that address:

	case EX_TYPE_UPLEVEL: {
		/* Let reg hold the unsigned number of machine
		 * words to pop off the stack before the return
		 * address, and imm the signed offset from the
		 * return address to the desired trap point.
		 *
		 * pointer in units of machine words, and imm the
		 * signed offset from this stack word...
		 */
		unsigned long *sp = (unsigned long *)regs->sp + reg;
		regs->ip = *sp++ + (int16_t)imm;
		regs->sp = (unsigned long)sp;
		goto again;	/* Loop back to the beginning */
	}

Again, "obviously correct" code attached.

	-hpa

NOTE:

I had to dig several levels down to uncover actual situation, but 
pmu_msr_write() is actually a completely pointless function: all it does 
is shuffle some arguments, then calls pmu_msr_chk_emulated() and if it 
returns true AND the emulated flag is clear then does *exactly the same 
thing* that the calling code would have done if pmu_msr_write() itself 
had returned true... in other words, the whole function could be 
replaced by:

bool pmu_msr_write(uint32_t msr, uint64_t val)
{
         bool emulated;

         return pmu_msr_chk_emulated(msr, &val, false, &emulated) &&
  	       emulated;
}

pmu_msr_read() does the equivalent stupidity; the obvious fix(es) to 
pmu_msr_chk_emulated() I hope are obvious.

View attachment "xen.S" of type "text/plain" (3135 bytes)