linux-kernel - Re: [PATCH] x86/tdx: Handle load_unaligned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <083519ab-752f-9815-7741-22b3fcc03322@intel.com>
Date:   Tue, 17 May 2022 15:16:42 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        seanjc@...gle.com
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        luto@...nel.org, peterz@...radead.org,
        sathyanarayanan.kuppuswamy@...ux.intel.com, ak@...ux.intel.com,
        dan.j.williams@...el.com, david@...hat.com, hpa@...or.com,
        thomas.lendacky@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a
 shared page

On 5/17/22 13:17, Kirill A. Shutemov wrote:
>>> Given that we had to adjust IP in handle_mmio() anyway, do you still think
>>> "ve->instr_len = 0;" is wrong? I dislike ip_adjusted more.
>> Something is wrong about it.
>>
>> You could call it 've->instr_bytes_to_handle' or something. Then it
>> makes actual logical sense when you handle it to zero it out.  I just
>> want it to be more explicit when the upper levels need to do something.
>>
>> Does ve->instr_len==0 both when the TDX module isn't providing
>> instruction sizes *and* when no handling is necessary?  That seems like
>> an unfortunate logical multiplexing of 0.
> For EPT violation, ve->instr_len has *something* (not zero) that doesn't
> match the actual instruction size. I dig out that it is filled with data
> from VMREAD(0x440C), but I don't know where is the ultimate origin of the
> data.

The SDM has a breakdown:

	27.2.5 Information for VM Exits Due to Instruction Execution

I didn't realize it came from VMREAD.  I guess I assumed it came from
some TDX module magic.  Silly me.

The SDM makes it sound like we should be more judicious about using
've->instr_len' though.  "All VM exits other than those listed in the
above items leave this field undefined."  Looking over
virt_exception_kernel(), we've got five cases from CPU instructions that
cause unconditional VMEXITs:

        case EXIT_REASON_HLT:
        case EXIT_REASON_MSR_READ:
        case EXIT_REASON_MSR_WRITE:
        case EXIT_REASON_CPUID:
        case EXIT_REASON_IO_INSTRUCTION:

and should have that field filled out, plus one that doesn't:

        case EXIT_REASON_IO_INSTRUCTION:

It seems awfully fragile to me to have the hardware be providing the
'instr_len' in those cases, but not in one other one.  The data in there
is garbage for EXIT_REASON_IO_INSTRUCTION.  The reason we don't consume
garbage is that all the paths leading out of handle_mmio() that return
true also set 've->instr_len'.  But that logic is entirely opaque.

It's also borderline criminal to have six functions that look identical
(in that switch statement), but one of them has different behavior for
've->instr_len'.

I'd probably do it like this:

static int handle_halt(struct ve_info *ve)
{
        /*
         * Since non safe halt is mainly used in CPU offlining
         * and the guest will always stay in the halt state, don't
         * call the STI instruction (set do_sti as false).
         */
        const bool irq_disabled = irqs_disabled();
        const bool do_sti = false;

        if (__halt(irq_disabled, do_sti))
                return -EIO;

	/*
	 * VM-exit instruction length is defined for HLT.  See:
	 * "Information for VM Exits Due to Instruction Execution"
	 * in the SDM.
	 */
        return ve->insn_length;
}

Any >=0 return means the exception was handled and it tells the caller
hoe much to advance RIP.

Then handle_mmio() can say:

	/*
	 * VM-exit instruction length is not provided for the EPT
	 * violations that MMIO causes.  Use the insn_decode() length:
	 */
        return insn.length;

See?  Now everybody that goes and writes a new #VE exception helper has
a chance of actually getting this right.  As it stands, if someone adds
one more of these, they'll probably get random behavior.  This way, they
actually have to choose.  They _might_ even go looking at the SDM.