linux-kernel - Re: [PATCH] x86/tdx: Handle load_unaligned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YoQkTtrMiU2bff9i@google.com>
Date:   Tue, 17 May 2022 22:40:14 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        luto@...nel.org, peterz@...radead.org,
        sathyanarayanan.kuppuswamy@...ux.intel.com, ak@...ux.intel.com,
        dan.j.williams@...el.com, david@...hat.com, hpa@...or.com,
        thomas.lendacky@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a
 shared page

On Tue, May 17, 2022, Dave Hansen wrote:
> On 5/17/22 13:17, Kirill A. Shutemov wrote:
> >>> Given that we had to adjust IP in handle_mmio() anyway, do you still think
> >>> "ve->instr_len = 0;" is wrong? I dislike ip_adjusted more.
> >> Something is wrong about it.
> >>
> >> You could call it 've->instr_bytes_to_handle' or something. Then it
> >> makes actual logical sense when you handle it to zero it out.  I just
> >> want it to be more explicit when the upper levels need to do something.
> >>
> >> Does ve->instr_len==0 both when the TDX module isn't providing
> >> instruction sizes *and* when no handling is necessary?  That seems like
> >> an unfortunate logical multiplexing of 0.
> > For EPT violation, ve->instr_len has *something* (not zero) that doesn't
> > match the actual instruction size. I dig out that it is filled with data
> > from VMREAD(0x440C), but I don't know where is the ultimate origin of the
> > data.
> 
> The SDM has a breakdown:
> 
> 	27.2.5 Information for VM Exits Due to Instruction Execution
> 
> I didn't realize it came from VMREAD.  I guess I assumed it came from
> some TDX module magic.  Silly me.
> 
> The SDM makes it sound like we should be more judicious about using
> 've->instr_len' though.  "All VM exits other than those listed in the
> above items leave this field undefined."  Looking over
> virt_exception_kernel(), we've got five cases from CPU instructions that
> cause unconditional VMEXITs:

None of the below exit unconditionally.

>         case EXIT_REASON_HLT:
>         case EXIT_REASON_MSR_READ:
>         case EXIT_REASON_MSR_WRITE:
>         case EXIT_REASON_CPUID:
>         case EXIT_REASON_IO_INSTRUCTION:
> 
> and should have that field filled out, plus one that doesn't:
> 
>         case EXIT_REASON_IO_INSTRUCTION:

I/O fills the length.  IN, INS, OUT, and OUTS are all listed.  It's not just
unconditional exits that provide the instruction length.  The instruction length
is provided if the instruction is the direct cause of the exit, whether or not
the instruction exits unconditionally doesn't matter.

  For fault-like VM exits due to attempts to execute one of the following
  instructions that cause VM exits unconditionally or based on the settings of
  VM-execution controls.

> Then handle_mmio() can say:
> 
> 	/*
> 	 * VM-exit instruction length is not provided for the EPT
> 	 * violations that MMIO causes.  Use the insn_decode() length:

This is inaccurate.  The instruction length _is_ provided on EPT Violation VM-Exits
(it's also provided by all Intel CPUs on EPT Misconfigs even though the SDM doesn't
say so).

The instruction length is wrong in the TDX case because there is no EPT Violation
VM-Exit.  The EPT Violation is morphed to a #VE by the CPU, and the instruction
length isn't one of the fields that's saved into the #VE info struct by the CPU.
When the TDX Module gets control on the TDCALL, VMCS.INSTRUCTION_LENGTH will hold
the length of the TDCALL, not the instruction that caused the #VE, i.e. the TDX
Module can't provide the correct length.

For all other #VE cases in TDX, the #VE is injected by software (TDX module) after
the instruction-based VM-Exit.  Before injecting the #VE, the TDX module grabs the
length from the VMCS and manually records it in the #VE info struct.