linux-kernel - Re: [RFC PTI 3/3] x86/pti: Put the LDT in its own PGD if PTI is on

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1712082157120.2301@nanos>
Date:   Fri, 8 Dec 2017 22:48:07 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Andy Lutomirski <luto@...nel.org>
cc:     x86@...nel.org, linux-kernel@...r.kernel.org,
        Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        David Laight <David.Laight@...lab.com>,
        Kees Cook <keescook@...omium.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC PTI 3/3] x86/pti: Put the LDT in its own PGD if PTI is on

On Fri, 8 Dec 2017, Andy Lutomirski wrote:

> With PTI on, we need the LDT to be in the usermode tables somewhere,
> and the LDT is per-mm.
> 
> tglx had a hack to have a per-cpu LDT and context switch it, but it
> was probably insanely slow due to the required TLB flushes.
>
> Instead, take advantage of the fact that we have an address space
> hole that gives us a completely unused pgd and make that pgd be
> per-mm.  We can put the LDT in it.
> 
> This has a down side: the LDT isn't (currently) randomized, and an
> attack that can write the LDT is instant root due to call gates
> (thanks, AMD, for leaving call gates in AMD64 but designing them
> wrong so they're only useful for exploits).  We could mitigate this
> by making the LDT read-only or randomizing it, either of which is
> strightforward on top of this patch.

Randomizing yes. RO not so much. I spent quite some time with Peter Zijstra
to make that work.

The problem is that the CPU tries to write the accessed bit in the entry
which is referenced by a selector register.

In most cases this is a non-issue because the access happens not when the
selector is loaded, it happens when the selector is referenced, e.g. mov
fs:.... This results in a very nice to identify page fault and writing the
accessed bit from a special fault handler works nicely.

Now whats special is the stack segment. If the LDT has been modified and
the IRET or whatever brings the task back to user space, it loads SS from
stack and this immediately results in a #GP, which does not tell what
failed and a fixup of the address bit does not work in that case. So the
#GP just comes back.

Forcing he accessed bit to 1 when installing the descriptors does not help
either.

So yes, thanks a lot to the folks who came up with that wonderful exploit
target.

> +static int map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
> +{
> +#ifdef CONFIG_X86_64
> +	pgd_t *pgd;
> +	p4d_t *p4d;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +	bool is_vmalloc;
> +	bool had_top_level_entry;
> +	int i;
> +
> +	if (!static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_PTI))
> +		return 0;
> +
> +	WARN_ON(ldt->slot != -1);
> +
> +	/*
> +	 * Both LDT slots are contained in a single PMD, so we can pass
> +	 * LDT_BASE_ADDR instead of the real address to all the _offset
> +	 * helpers.
> +	 */
> +	pgd = pgd_offset(mm, LDT_BASE_ADDR);
> +
> +	/*
> +	 * Did we already have the top level entry allocated?  We can't
> +	 * use pgd_none() for this because it doens't do anything on
> +	 * 4-level page table kernels.
> +	 */
> +	had_top_level_entry = (pgd->pgd != 0);
> +
> +	p4d = p4d_alloc(mm, pgd, LDT_BASE_ADDR);
> +	if (!p4d)
> +		return -ENOMEM;
> +
> +	pud = pud_alloc(mm, p4d, LDT_BASE_ADDR);
> +	if (!pud)
> +		return -ENOMEM;
> +	pmd = pmd_alloc(mm, pud, LDT_BASE_ADDR);
> +	if (!pmd)
> +		return -ENOMEM;
> +	if (pte_alloc(mm, pmd, LDT_BASE_ADDR))
> +		return -ENOMEM;

I really find it disgusting that we have a gazillion copies of exactly the
same code which just populates stuff up the place where PTEs can be
installed.

Can we pretty please have _ONE_ general out of line helper function for
this?

> +
> +	if (mm->context.ldt) {
> +		/*
> +		 * We already had an LDT.  The top-level entry should already
> +		 * have been allocated and synchronized with the usermode
> +		 * tables.
> +		 */
> +		WARN_ON(!had_top_level_entry);
> +		if (static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_PTI))
> +			WARN_ON(!kernel_to_user_pgdp(pgd)->pgd);
> +	} else {
> +		WARN_ON(had_top_level_entry);
> +		if (static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_PTI)) {
> +			WARN_ON(kernel_to_user_pgdp(pgd)->pgd);
> +			set_pgd(kernel_to_user_pgdp(pgd), *pgd);
> +		}
> +	}
> +
> +	is_vmalloc = is_vmalloc_addr(ldt->entries);

Just get rid of that single page allocation thing and use vmalloc
unconditionally.

Thanks,

	tglx