lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 9 Mar 2021 15:49:13 +0200 From: Topi Miettinen <toiwoton@...il.com> To: Kees Cook <keescook@...omium.org> Cc: linux-hardening@...r.kernel.org, akpm@...ux-foundation.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org, Andy Lutomirski <luto@...nel.org>, Jann Horn <jannh@...gle.com>, Linux API <linux-api@...r.kernel.org>, Matthew Wilcox <willy@...radead.org>, Mike Rapoport <rppt@...nel.org>, Vlad Rezki <urezki@...il.com> Subject: Re: [PATCH v3] mm/vmalloc: randomize vmalloc() allocations On 8.3.2021 23.38, Kees Cook wrote: > On Mon, Feb 15, 2021 at 10:26:34PM +0200, Topi Miettinen wrote: >> Memory mappings inside kernel allocated with vmalloc() are in >> predictable order and packed tightly toward the low addresses, except >> for per-cpu areas which start from top of the vmalloc area. With >> new kernel boot parameter 'randomize_vmalloc=1', the entire area is >> used randomly to make the allocations less predictable and harder to >> guess for attackers. Also module and BPF code locations get randomized >> (within their dedicated and rather small area though) and if >> CONFIG_VMAP_STACK is enabled, also kernel thread stack locations. >> >> On 32 bit systems this may cause problems due to increased VM >> fragmentation if the address space gets crowded. >> >> On all systems, it will reduce performance and increase memory and >> cache usage due to less efficient use of page tables and inability to >> merge adjacent VMAs with compatible attributes. On x86_64 with 5 level >> page tables, in the worst case, additional page table entries of up to >> 4 pages are created for each mapping, so with small mappings there's >> considerable penalty. >> >> Without randomize_vmalloc=1: >> $ grep -v kernel_clone /proc/vmallocinfo >> 0xffffc90000000000-0xffffc90000009000 36864 irq_init_percpu_irqstack+0x176/0x1c0 vmap >> 0xffffc90000009000-0xffffc9000000b000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe1000 ioremap >> 0xffffc9000000c000-0xffffc9000000f000 12288 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe0000 ioremap >> 0xffffc9000000f000-0xffffc90000011000 8192 hpet_enable+0x31/0x4a4 phys=0x00000000fed00000 ioremap >> 0xffffc90000011000-0xffffc90000013000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc >> 0xffffc90000013000-0xffffc90000015000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc >> 0xffffc90000015000-0xffffc90000017000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc >> 0xffffc90000021000-0xffffc90000023000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc >> 0xffffc90000023000-0xffffc90000025000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x00000000fed00000 ioremap >> 0xffffc90000025000-0xffffc90000027000 8192 memremap+0x19c/0x280 phys=0x00000000000f5000 ioremap >> 0xffffc90000031000-0xffffc90000036000 20480 pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc >> 0xffffc90000043000-0xffffc90000047000 16384 n_tty_open+0x11/0xe0 pages=3 vmalloc >> 0xffffc90000211000-0xffffc90000232000 135168 crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc >> 0xffffc90000232000-0xffffc90000253000 135168 crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc >> 0xffffc900005a9000-0xffffc900005ba000 69632 pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc >> 0xffffc900005ba000-0xffffc900005cc000 73728 pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc >> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x2290 vmalloc >> >> With randomize_vmalloc=1, the allocations are randomized: >> $ grep -v kernel_clone /proc/vmallocinfo >> 0xffffc9759d443000-0xffffc9759d445000 8192 hpet_enable+0x31/0x4a4 phys=0x00000000fed00000 ioremap >> 0xffffccf1e9f66000-0xffffccf1e9f68000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc >> 0xffffcd2fc02a4000-0xffffcd2fc02a6000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc >> 0xffffcdaefb898000-0xffffcdaefb89b000 12288 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe0000 ioremap >> 0xffffcef8074c3000-0xffffcef8074cc000 36864 irq_init_percpu_irqstack+0x176/0x1c0 vmap >> 0xffffcf725ca2e000-0xffffcf725ca4f000 135168 crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc >> 0xffffd0efb25e1000-0xffffd0efb25f2000 69632 pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc >> 0xffffd27054678000-0xffffd2705467c000 16384 n_tty_open+0x11/0xe0 pages=3 vmalloc >> 0xffffd2adf716e000-0xffffd2adf7180000 73728 pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc >> 0xffffd4ba5fb6b000-0xffffd4ba5fb6d000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x000000001ffe1000 ioremap >> 0xffffded126192000-0xffffded126194000 8192 memremap+0x19c/0x280 phys=0x00000000000f5000 ioremap >> 0xffffe01a4dbcd000-0xffffe01a4dbcf000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc >> 0xffffe4b649952000-0xffffe4b649954000 8192 acpi_os_map_iomem+0x2ac/0x2d0 phys=0x00000000fed00000 ioremap >> 0xffffe71ed592a000-0xffffe71ed592c000 8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc >> 0xffffe7dc5824f000-0xffffe7dc58270000 135168 crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc >> 0xffffe8f4f9800000-0xffffe8f4f9a00000 2097152 pcpu_get_vm_areas+0x0/0x2290 vmalloc >> 0xffffe8f4f9a19000-0xffffe8f4f9a1e000 20480 pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc >> >> With CONFIG_VMAP_STACK, also kernel thread stacks are placed in >> vmalloc area and therefore they also get randomized (only one example >> line from /proc/vmallocinfo shown for brevity): >> >> unrandomized: >> 0xffffc90000018000-0xffffc90000021000 36864 kernel_clone+0xf9/0x560 pages=8 vmalloc >> >> randomized: >> 0xffffcb57611a8000-0xffffcb57611b1000 36864 kernel_clone+0xf9/0x560 pages=8 vmalloc >> >> CC: Andrew Morton <akpm@...ux-foundation.org> >> CC: Andy Lutomirski <luto@...nel.org> >> CC: Jann Horn <jannh@...gle.com> >> CC: Kees Cook <keescook@...omium.org> >> CC: Linux API <linux-api@...r.kernel.org> >> CC: Matthew Wilcox <willy@...radead.org> >> CC: Mike Rapoport <rppt@...nel.org> >> CC: Vlad Rezki <urezki@...il.com> >> Signed-off-by: Topi Miettinen <toiwoton@...il.com> > > Thanks for working on this! I'd like to see this in the kernel, even if > it's only for the more paranoia crowd. Have you done any workload > measurements to see how much of a hit this has in the real world? I compiled the kernel a few times on an idle machine and with randomize_vmalloc=1 the real times were 0.3% slower. Not sure if this is a good measure for kernel performance. The randomization should increase use of page tables and they should also compete for CPU caches. >> --- >> v2: retry allocation from other end of vmalloc space in case of >> failure (Matthew Wilcox), improve commit message and documentation >> v3: randomize also percpu allocations (pcpu_get_vm_areas()) >> --- >> .../admin-guide/kernel-parameters.txt | 23 ++++++++++++ >> mm/vmalloc.c | 36 +++++++++++++++++-- >> 2 files changed, 56 insertions(+), 3 deletions(-) >> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >> index c722ec19cd00..38d6b5728ccc 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -4024,6 +4024,29 @@ >> >> ramdisk_start= [RAM] RAM disk image start address >> >> + randomize_vmalloc= [KNL] Randomize vmalloc() allocations. With 1, >> + the entire vmalloc() area is used randomly to >> + make the allocations less predictable and >> + harder to guess for attackers. Also module and >> + BPF code locations get randomized (within >> + their dedicated and rather small area though) >> + and if CONFIG_VMAP_STACK is enabled, also >> + kernel thread stack locations. >> + >> + On 32 bit systems this may cause problems due >> + to increased VM fragmentation if the address >> + space gets crowded. >> + >> + On all systems, it will reduce performance and >> + increase memory and cache usage due to less >> + efficient use of page tables and inability to >> + merge adjacent VMAs with compatible >> + attributes. On x86_64 with 5 level page >> + tables, in the worst case, additional page >> + table entries of up to 4 pages are created for >> + each mapping, so with small mappings there's >> + considerable penalty. >> + >> random.trust_cpu={on,off} >> [KNL] Enable or disable trusting the use of the >> CPU's random number generator (if available) to >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index 4d88fe5a277a..1e8e0ee1925f 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -34,6 +34,7 @@ >> #include <linux/bitops.h> >> #include <linux/rbtree_augmented.h> >> #include <linux/overflow.h> >> +#include <linux/random.h> >> >> #include <linux/uaccess.h> >> #include <asm/tlbflush.h> >> @@ -1089,6 +1090,17 @@ adjust_va_to_fit_type(struct vmap_area *va, >> return 0; >> } >> >> +static int randomize_vmalloc = 0; > > This should be __ro_after_init, and even better, it should be a static > branch so there's no performance hit at all for the "disabled" case. > I recommend: > > static DEFINE_STATIC_KEY_FALSE_RO(randomize_vmalloc); > > static int __init set_randomize_vmalloc(char *str) > { > int ret; > bool bool_result; > > ret = kstrtobool(buf, &bool_result); > if (ret) > return ret; > > if (bool_result) > static_branch_enable(&randomize_kstack_offset); > else > static_branch_disable(&randomize_kstack_offset); > return 1; > } > __setup("randomize_vmalloc=", set_randomize_vmalloc); > > >> + >> /* >> * Returns a start address of the newly allocated area, if success. >> * Otherwise a vend is returned that indicates failure. >> @@ -1162,7 +1174,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, >> int node, gfp_t gfp_mask) >> { >> struct vmap_area *va, *pva; >> - unsigned long addr; >> + unsigned long addr, voffset; >> int purged = 0; >> int ret; >> >> @@ -1217,11 +1229,24 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, >> if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) >> kmem_cache_free(vmap_area_cachep, pva); >> >> + /* Randomize allocation */ >> + if (randomize_vmalloc) { > > if (static_branch_unlikely(randomize_vmalloc)) { > ... Thanks, I'll post an updated version with static_branch. > >> + voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1); >> + voffset = PAGE_ALIGN(voffset); >> + if (voffset + size > vend - vstart) >> + voffset = vend - vstart - size; >> + } else >> + voffset = 0; >> + >> /* >> * If an allocation fails, the "vend" address is >> * returned. Therefore trigger the overflow path. >> */ >> - addr = __alloc_vmap_area(size, align, vstart, vend); >> + addr = __alloc_vmap_area(size, align, vstart + voffset, vend); >> + >> + if (unlikely(addr == vend) && voffset) >> + /* Retry randomization from other end */ >> + addr = __alloc_vmap_area(size, align, vstart, vstart + voffset + size); >> spin_unlock(&free_vmap_area_lock); >> >> if (unlikely(addr == vend)) >> @@ -3256,7 +3281,12 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, >> start = offsets[area]; >> end = start + sizes[area]; >> >> - va = pvm_find_va_enclose_addr(vmalloc_end); >> + if (randomize_vmalloc) >> + va = pvm_find_va_enclose_addr(vmalloc_start + >> + (get_random_long() & >> + (roundup_pow_of_two(vmalloc_end - vmalloc_start) - 1))); >> + else >> + va = pvm_find_va_enclose_addr(vmalloc_end); >> base = pvm_determine_end_from_reverse(&va, align) - end; >> >> while (true) { >> -- >> 2.30.0 >> > > But otherwise, yes please. It's a simple change that makes per-boot > order of vmalloc allocations unpredictable. I'm for it! :) Great and thanks for the review! -Topi
Powered by blists - more mailing lists