[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <083519ab-752f-9815-7741-22b3fcc03322@intel.com>
Date: Tue, 17 May 2022 15:16:42 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
seanjc@...gle.com
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
luto@...nel.org, peterz@...radead.org,
sathyanarayanan.kuppuswamy@...ux.intel.com, ak@...ux.intel.com,
dan.j.williams@...el.com, david@...hat.com, hpa@...or.com,
thomas.lendacky@....com, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a
shared page
On 5/17/22 13:17, Kirill A. Shutemov wrote:
>>> Given that we had to adjust IP in handle_mmio() anyway, do you still think
>>> "ve->instr_len = 0;" is wrong? I dislike ip_adjusted more.
>> Something is wrong about it.
>>
>> You could call it 've->instr_bytes_to_handle' or something. Then it
>> makes actual logical sense when you handle it to zero it out. I just
>> want it to be more explicit when the upper levels need to do something.
>>
>> Does ve->instr_len==0 both when the TDX module isn't providing
>> instruction sizes *and* when no handling is necessary? That seems like
>> an unfortunate logical multiplexing of 0.
> For EPT violation, ve->instr_len has *something* (not zero) that doesn't
> match the actual instruction size. I dig out that it is filled with data
> from VMREAD(0x440C), but I don't know where is the ultimate origin of the
> data.
The SDM has a breakdown:
27.2.5 Information for VM Exits Due to Instruction Execution
I didn't realize it came from VMREAD. I guess I assumed it came from
some TDX module magic. Silly me.
The SDM makes it sound like we should be more judicious about using
've->instr_len' though. "All VM exits other than those listed in the
above items leave this field undefined." Looking over
virt_exception_kernel(), we've got five cases from CPU instructions that
cause unconditional VMEXITs:
case EXIT_REASON_HLT:
case EXIT_REASON_MSR_READ:
case EXIT_REASON_MSR_WRITE:
case EXIT_REASON_CPUID:
case EXIT_REASON_IO_INSTRUCTION:
and should have that field filled out, plus one that doesn't:
case EXIT_REASON_IO_INSTRUCTION:
It seems awfully fragile to me to have the hardware be providing the
'instr_len' in those cases, but not in one other one. The data in there
is garbage for EXIT_REASON_IO_INSTRUCTION. The reason we don't consume
garbage is that all the paths leading out of handle_mmio() that return
true also set 've->instr_len'. But that logic is entirely opaque.
It's also borderline criminal to have six functions that look identical
(in that switch statement), but one of them has different behavior for
've->instr_len'.
I'd probably do it like this:
static int handle_halt(struct ve_info *ve)
{
/*
* Since non safe halt is mainly used in CPU offlining
* and the guest will always stay in the halt state, don't
* call the STI instruction (set do_sti as false).
*/
const bool irq_disabled = irqs_disabled();
const bool do_sti = false;
if (__halt(irq_disabled, do_sti))
return -EIO;
/*
* VM-exit instruction length is defined for HLT. See:
* "Information for VM Exits Due to Instruction Execution"
* in the SDM.
*/
return ve->insn_length;
}
Any >=0 return means the exception was handled and it tells the caller
hoe much to advance RIP.
Then handle_mmio() can say:
/*
* VM-exit instruction length is not provided for the EPT
* violations that MMIO causes. Use the insn_decode() length:
*/
return insn.length;
See? Now everybody that goes and writes a new #VE exception helper has
a chance of actually getting this right. As it stands, if someone adds
one more of these, they'll probably get random behavior. This way, they
actually have to choose. They _might_ even go looking at the SDM.
Powered by blists - more mailing lists