linux-kernel - Re: [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJcbSZGwnc5TDokC5D=YV5qbEFBfQOZWqOTsJr3rN9nR2M1XZw@mail.gmail.com>
Date:	Tue, 21 Jun 2016 09:46:56 -0700
From:	Thomas Garnier <thgarnie@...gle.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Kees Cook <keescook@...omium.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	Borislav Petkov <bp@...e.de>, Juergen Gross <jgross@...e.com>,
	Matt Fleming <matt@...eblueprint.co.uk>,
	Toshi Kani <toshi.kani@....com>, Baoquan He <bhe@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Dan Williams <dan.j.williams@...el.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Andy Lutomirski <luto@...nel.org>,
	Alexander Kuleshov <kuleshovmail@...il.com>,
	Alexander Popov <alpopov@...ecurity.com>,
	Joerg Roedel <jroedel@...e.de>, Dave Young <dyoung@...hat.com>,
	Lv Zheng <lv.zheng@...el.com>,
	Mark Salter <msalter@...hat.com>,
	Stephen Smalley <sds@...ho.nsa.gov>,
	Dmitry Vyukov <dvyukov@...gle.com>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	David Rientjes <rientjes@...gle.com>,
	Christian Borntraeger <borntraeger@...ibm.com>,
	Jan Beulich <JBeulich@...e.com>,
	Kefeng Wang <wangkefeng.wang@...wei.com>,
	Seth Jennings <sjennings@...iantweb.net>,
	Yinghai Lu <yinghai@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86_64)

On Fri, Jun 17, 2016 at 3:26 AM, Ingo Molnar <mingo@...nel.org> wrote:
>
> * Kees Cook <keescook@...omium.org> wrote:
>
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1993,6 +1993,23 @@ config PHYSICAL_ALIGN
>>
>>         Don't change this unless you know what you are doing.
>>
>> +config RANDOMIZE_MEMORY
>> +     bool "Randomize the kernel memory sections"
>> +     depends on X86_64
>> +     depends on RANDOMIZE_BASE
>> +     default RANDOMIZE_BASE
>> +     ---help---
>> +        Randomizes the base virtual address of kernel memory sections
>> +        (physical memory mapping, vmalloc & vmemmap). This security feature
>> +        makes exploits relying on predictable memory locations less reliable.
>> +
>> +        The order of allocations remains unchanged. Entropy is generated in
>> +        the same way as RANDOMIZE_BASE. Current implementation in the optimal
>> +        configuration have in average 30,000 different possible virtual
>> +        addresses for each memory section.
>> +
>> +        If unsure, say N.
>
> So this is really poor naming!
>
> The user, in first approximation, will see something like this:
>
>   Randomize the kernel memory sections (RANDOMIZE_MEMORY) [Y/n/?] (NEW)
>
> ... and could naively conclude that this feature will randomize memory contents.
>
> Furthermore, there's no apparent connection to another, related kernel feature:
>
>   Randomize the address of the kernel image (KASLR) (RANDOMIZE_BASE) [Y/n/?]
>
> A better naming would be something like:
>
>      Randomize other static kernel memory addresses (RANDOMIZE_ALL) [Y/n/?] (NEW)
>
> (assuming it's truly 'all')
>
> But ... I don't see it explained anywhere why a user, once he expressed interest
> in randomization would even want to _not_ randomize as much as possible. I.e. why
> does this new option exist at all, shouldn't we just gradually extend
> randomization to more memory areas and control it via CONFIG_RANDOMIZE_BASE?
>
> Btw., CONFIG_RANDOMIZE_BASE is probably a misnomer, and after these patches it
> will be more of a misnomer - so we should rename it to something better. For
> example CONFIG_KASLR would have the advantage of being obviously named at a
> glance, to anyone who knows what 'ASLR' means. To those who don't know the short
> description will tell it:
>
>   Kernel address space layout randomization (KASLR) [Y/n/?]
>
> This would also fit the kernel internal naming of kaslr.c/etc.
>
> What do you think?
>
>> +++ b/arch/x86/mm/kaslr.c
>> @@ -0,0 +1,167 @@
>> +/*
>> + * This file implements KASLR memory randomization for x86_64. It randomizes
>> + * the virtual address space of kernel memory sections (physical memory
>> + * mapping, vmalloc & vmemmap) for x86_64. This security feature mitigates
>> + * exploits relying on predictable kernel addresses.
>> + *
>> + * Entropy is generated using the KASLR early boot functions now shared in
>> + * the lib directory (originally written by Kees Cook). Randomization is
>> + * done on PGD & PUD page table levels to increase possible addresses. The
>> + * physical memory mapping code was adapted to support PUD level virtual
>> + * addresses. This implementation on the best configuration provides 30,000
>> + * possible virtual addresses in average for each memory section.  An
>> + * additional low memory page is used to ensure each CPU can start with a
>> + * PGD aligned virtual address (for realmode).
>> + *
>> + * The order of each memory section is not changed. The feature looks at
>> + * the available space for the sections based on different configuration
>> + * options and randomizes the base and space between each. The size of the
>> + * physical memory mapping is the available physical memory.
>> + *
>> + */
>
> (Stray newline.)
>
>> +#include <linux/kernel.h>
>> +#include <linux/errno.h>
>> +#include <linux/types.h>
>> +#include <linux/mm.h>
>> +#include <linux/smp.h>
>> +#include <linux/init.h>
>> +#include <linux/memory.h>
>> +#include <linux/random.h>
>> +
>> +#include <asm/processor.h>
>> +#include <asm/pgtable.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/e820.h>
>> +#include <asm/init.h>
>> +#include <asm/setup.h>
>> +#include <asm/kaslr.h>
>> +#include <asm/kasan.h>
>> +
>> +#include "mm_internal.h"
>
> How many of those include lines are truly unnecessary, or did this just got copied
> from another file?
>

No that's correct. I will clean that up.

>> +/*
>> + * Memory base and end randomization is based on different configurations.
>> + * We want as much space as possible to increase entropy available.
>> + */
>> +static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
>> +
>> +#if defined(CONFIG_KASAN)
>> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
>> +#elif defined(CONFIG_X86_ESPFIX64)
>> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
>> +#elif defined(CONFIG_EFI)
>> +static const unsigned long memory_rand_end = EFI_VA_START;
>> +#else
>> +static const unsigned long memory_rand_end = __START_KERNEL_map;
>> +#endif
>
> It's unclear to me from the naming and the description whether these are virtual
> or physical memory ranges. Nor is it explained why this laundry list of kernel
> options influences the randomization range - and it's not explained how new kernel
> features modifying the virtual memory layout should plug into this mechanism.
>
> Could we please get a bit more verbose introduction into why all of this is done?
>

The comment was describing that we want as much as possible for
entropy. I will make that clearer and I will add information on how to
change changing these variables.

>> +/* Default values */
>> +unsigned long page_offset_base = __PAGE_OFFSET_BASE;
>> +EXPORT_SYMBOL(page_offset_base);
>> +unsigned long vmalloc_base = __VMALLOC_BASE;
>> +EXPORT_SYMBOL(vmalloc_base);
>> +unsigned long vmemmap_base = __VMEMMAP_BASE;
>> +EXPORT_SYMBOL(vmemmap_base);
>> +
>> +/* Describe each randomized memory sections in sequential order */
>> +static __initdata struct kaslr_memory_region {
>> +     unsigned long *base;
>> +     unsigned short size_tb;
>> +} kaslr_regions[] = {
>> +     { &page_offset_base, 64/* Maximum */ },
>> +     { &vmalloc_base, VMALLOC_SIZE_TB },
>> +     { &vmemmap_base, 1 },
>> +};
>
> So this comment is totally misleading. This area does not 'describe' each memory
> section, it actually puts live pointers to virtual memory address offset control
> variables used by other subsystems into this array - allowing us to randomize them
> programmatically.
>
> This patch should be split up into 4 components:
>
>  - core patch adding all the infrastructure but not actually randomizing anything,
>    the kaslr_regions[] array is empty.
>
>  - a patch adding page_offset_base randomization.
>
>  - a patch adding vmalloc_base randomization.
>
>  - a patch adding vmemmap_base randomization.
>
> This way if any breakage is introduced by this randomization it can be bisected to
> in a finegrained fashion. It would also have saved 5 minutes from my life trying
> to interpret the code here :-/
>
> In fact I'd suggest a series of 7 patches, with each randomization patch split
> into two further patches:
>
>  - a patch making address offset variables dynamic.
>
>  - a patch actually enabling the randomization of that virtual address range.
>
> This structure allows us to carefully evaluate the code size and runtime impact of
> randomizing a given range.
>

Make sense. I will build a 4 component patch a described.

>> +
>> +/* Size in Terabytes + 1 hole */
>> +static __init unsigned long get_padding(struct kaslr_memory_region *region)
>> +{
>> +     return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
>> +}
>
> So why is size_tb a short, forcing ugly type casts like this? Also, since this
> appears to be a trivial shift, why isn't this an inline?
>

Changed.

>> +/* Initialize base and padding for each memory section randomized with KASLR */
>> +void __init kernel_randomize_memory(void)
>> +{
>> +     size_t i;
>> +     unsigned long addr = memory_rand_start;
>> +     unsigned long padding, rand, mem_tb;
>> +     struct rnd_state rnd_st;
>> +     unsigned long remain_padding = memory_rand_end - memory_rand_start;
>
> Yeah, so these variable names are awful :-/
>
> Look at how postfix and prefix naming is mixed in the same function:
> 'memory_rand_start', but 'remain_padding'. Please use postfixes for all of them.
>
> Also why is one variable named 'rand', another 'rnd'? Did we run out of 'a'
> characters?
>
> Similarly, we have 'memory' and 'mem'. Pick one variant and harmonize!
>
> This patch is an object lesson in how fragile, insecure code is written.
>

Will change.

>> +
>> +     /*
>> +      * All these BUILD_BUG_ON checks ensures the memory layout is
>> +      * consistent with the current KASLR design.
>> +      */
>> +     BUILD_BUG_ON(memory_rand_start >= memory_rand_end);
>> +     BUILD_BUG_ON(config_enabled(CONFIG_KASAN) &&
>> +             memory_rand_end >= ESPFIX_BASE_ADDR);
>> +     BUILD_BUG_ON((config_enabled(CONFIG_KASAN) ||
>> +                     config_enabled(CONFIG_X86_ESPFIX64)) &&
>> +             memory_rand_end >= EFI_VA_START);
>> +     BUILD_BUG_ON((config_enabled(CONFIG_KASAN) ||
>> +                     config_enabled(CONFIG_X86_ESPFIX64) ||
>> +                     config_enabled(CONFIG_EFI)) &&
>> +             memory_rand_end >= __START_KERNEL_map);
>> +     BUILD_BUG_ON(memory_rand_end > __START_KERNEL_map);
>> +
>> +     if (!kaslr_enabled())
>> +             return;
>> +
>> +     BUG_ON(kaslr_regions[0].base != &page_offset_base);
>> +     mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
>
> Why is the granularity of the randomization 1TB? I don't see this explained
> anywhere. Some of the randomization may have PUD-granularity limitation - but PUD
> entries have a size of 512 GB.
>
> I think we should extend the kaslr_regions[] table with a minimum required
> alignment field instead of random granularity limitations.
>

I will change it so that the next region can start on the next PUD
entry with minimum padding.

>> +/*
>> + * Create PGD aligned trampoline table to allow real mode initialization
>> + * of additional CPUs. Consume only 1 low memory page.
>> + */
>> +void __meminit init_trampoline(void)
>
> This too should probably be a separate patch, for bisectability and reviewability
> reasons.
>

Will do.

>> +     unsigned long addr, next;
>> +     pgd_t *pgd;
>> +     pud_t *pud_page, *tr_pud_page;
>> +     int i;
>
> I had to look twice to understand that 'tr' stands for 'trampoline'. Please rename
> to 'pud_page_tramp' or so.
>

Will do

>> +     if (!kaslr_enabled())
>> +             return;
>> +
>> +     tr_pud_page = alloc_low_page();
>> +     set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
>
> Shouldn't we set it _after_ we've initialized the tr_pud_page page table?
>
> Also, why _PAGE_TABLE? AFAICS using _PAGE_TABLE adds _PAGE_USER.
>

It is used by pgd_populate in kernel_physical_mapping_init. I will
update to _KERNPG_TABLE.

>> +
>> +     addr = 0;
>> +     pgd = pgd_offset_k((unsigned long)__va(addr));
>
> Why the type cast?
>
> Also, similar complaint to the first patch: why is a physical address named in an
> ambiguous fashion, as 'addr'?
>

Will adapt as the other patch.

>> +     pud_page = (pud_t *) pgd_page_vaddr(*pgd);
>
> Side note: it appears all users of pgd_page_vaddr() are casting it immediately to
> a pointer type. Could we change the return type of this function to 'void *' or
> 'pud_t *'? AFAICS it's mostly used in platform code.
>

I agree but I think that should be a separate change.

>> +
>> +     for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
>> +             pud_t *pud, *tr_pud;
>> +
>> +             tr_pud = tr_pud_page + pud_index(addr);
>> +             pud = pud_page + pud_index((unsigned long)__va(addr));
>
> Why is this type cast needed?
>

Because __va return a pointer and pud_index expect an unsigned long. I
will edit in a similar way as the other patch.

>> +             next = (addr & PUD_MASK) + PUD_SIZE;
>> +
>> +             /* Needed to copy pte or pud alike */
>> +             BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
>> +             *tr_pud = *pud;
>
> So I don't understand what brought the size of 'pte' into this code. We are
> copying pud entries around AFAICS, why should the size of the pte entry matter?
>

To support PSE. Depending of the page_size_mask this entry can be a
pud entry or a pte. I will remove the BUILD_BUG_ON, it is confusing.

> Thanks,
>
>         Ingo>