linux-kernel - Re: [PATCH v3 5/6] x86/alternative: Use a single access in text

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <A53EB816-EF0F-48F1-8F4F-E1BB4BF25BD3@vmware.com>
Date:   Thu, 10 Jan 2019 17:29:59 +0000
From:   Nadav Amit <namit@...are.com>
To:     Josh Poimboeuf <jpoimboe@...hat.com>
CC:     X86 ML <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Andy Lutomirski <luto@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Jason Baron <jbaron@...mai.com>, Jiri Kosina <jkosina@...e.cz>,
        David Laight <David.Laight@...LAB.COM>,
        Borislav Petkov <bp@...en8.de>,
        Julia Cartwright <julia@...com>, Jessica Yu <jeyu@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Edward Cree <ecree@...arflare.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: [PATCH v3 5/6] x86/alternative: Use a single access in
 text_poke() where possible

> On Jan 10, 2019, at 9:20 AM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> 
> On Thu, Jan 10, 2019 at 09:32:23AM +0000, Nadav Amit wrote:
>>> @@ -714,14 +714,39 @@ void *text_poke(void *addr, const void *opcode, size_t len)
>>> 	}
>>> 	BUG_ON(!pages[0]);
>>> 	local_irq_save(flags);
>>> +
>>> 	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>>> 	if (pages[1])
>>> 		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
>>> -	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
>>> -	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>>> +
>>> +	vaddr = fix_to_virt(FIX_TEXT_POKE0) + ((unsigned long)addr & ~PAGE_MASK);
>>> +
>>> +	/*
>>> +	 * Use a single access where possible.  Note that a single unaligned
>>> +	 * multi-byte write will not necessarily be atomic on x86-32, or if the
>>> +	 * address crosses a cache line boundary.
>>> +	 */
>>> +	switch (len) {
>>> +	case 1:
>>> +		WRITE_ONCE(*(u8 *)vaddr, *(u8 *)opcode);
>>> +		break;
>>> +	case 2:
>>> +		WRITE_ONCE(*(u16 *)vaddr, *(u16 *)opcode);
>>> +		break;
>>> +	case 4:
>>> +		WRITE_ONCE(*(u32 *)vaddr, *(u32 *)opcode);
>>> +		break;
>>> +	case 8:
>>> +		WRITE_ONCE(*(u64 *)vaddr, *(u64 *)opcode);
>>> +		break;
>>> +	default:
>>> +		memcpy((void *)vaddr, opcode, len);
>>> +	}
>>> +
>> 
>> Even if Intel and AMD CPUs are guaranteed to run instructions from L1
>> atomically, this may break instruction emulators, such as those that
>> hypervisors use. They might not read instructions atomically if on SMP VMs
>> when the VM's text_poke() races with the emulated instruction fetch.
>> 
>> While I can't find a reason for hypervisors to emulate this instruction,
>> smarter people might find ways to turn it into a security exploit.
> 
> Interesting point... but I wonder if it's a realistic concern.  BTW,
> text_poke_bp() also relies on undocumented behavior.
> 
> The entire instruction doesn't need to be read atomically; just the
> 32-bit call destination.  Assuming the hypervisor is x86-64, and it uses
> a 32-bit access to read the call destination (which seems logical), the
> intra-cacheline reads will be atomic, as stated in the SDM.

At least in KVM, it doesn’t do so intentionally - eventually the emulated
fetch is done using __copy_from_user(). So now you rely on
__copy_from_user() doing it correctly.

> If the above assumptions are not true, and the hypervisor reads the call
> destination non-atomically (which seems unlikely IMO), even then I don't
> see how it could be realistically exploitable.  It would just oops from
> calling a corrupt address.

It might still be exploitable as DoS though (again, not that I think exactly
how). Having said that, I might be negative just because I’ve put a lot of
effort into avoiding this problem according to the SDM…