lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d34fb0a-7aba-1e84-6426-006ea7c3d9f5@gmail.com>
Date:   Wed, 2 Dec 2020 20:49:06 +0200
From:   Topi Miettinen <toiwoton@...il.com>
To:     linux-hardening@...r.kernel.org, akpm@...ux-foundation.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc:     Andy Lutomirski <luto@...nel.org>, Jann Horn <jannh@...gle.com>,
        Kees Cook <keescook@...omium.org>,
        Linux API <linux-api@...r.kernel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Mike Rapoport <rppt@...nel.org>
Subject: Re: [PATCH] mm/vmalloc: randomize vmalloc() allocations

On 1.12.2020 23.45, Topi Miettinen wrote:
> Memory mappings inside kernel allocated with vmalloc() are in
> predictable order and packed tightly toward the low addresses. With
> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> used randomly to make the allocations less predictable and harder to
> guess for attackers.
> 
> Without randomize_vmalloc=1:
> $ cat /proc/vmallocinfo
> 0xffffc90000000000-0xffffc90000002000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
> 0xffffc90000002000-0xffffc90000005000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
> 0xffffc90000005000-0xffffc90000007000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
> 0xffffc90000007000-0xffffc90000009000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc90000009000-0xffffc9000000b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc9000000b000-0xffffc9000000d000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc9000000d000-0xffffc9000000f000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffc90000011000-0xffffc90000015000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
> 0xffffc900003de000-0xffffc900003e0000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
> 0xffffc900003e0000-0xffffc900003e2000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
> 0xffffc900003e2000-0xffffc900003f3000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
> 0xffffc900003f3000-0xffffc90000405000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
> 0xffffc90000405000-0xffffc9000040a000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc
> 
> With randomize_vmalloc=1, the allocations are randomized:
> $ cat /proc/vmallocinfo
> 0xffffca3a36442000-0xffffca3a36447000   20480 pcpu_create_chunk+0xed/0x2c0 pages=4 vmalloc
> 0xffffca63034d6000-0xffffca63034d9000   12288 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe0000 ioremap
> 0xffffcce23d32e000-0xffffcce23d330000    8192 memremap+0x1a1/0x280 phys=0x00000000000f5000 ioremap
> 0xffffcfb9f0e22000-0xffffcfb9f0e24000    8192 hpet_enable+0x36/0x4a9 phys=0x00000000fed00000 ioremap
> 0xffffd1df23e9e000-0xffffd1df23eb0000   73728 pcpu_create_chunk+0xb7/0x2c0 pages=17 vmalloc
> 0xffffd690c2990000-0xffffd690c2992000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x000000003ffe1000 ioremap
> 0xffffd8460c718000-0xffffd8460c71c000   16384 n_tty_open+0x16/0xe0 pages=3 vmalloc
> 0xffffd89aba709000-0xffffd89aba70b000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe0ca3f2ed000-0xffffe0ca3f2ef000    8192 acpi_os_map_iomem+0x29e/0x2c0 phys=0x00000000fed00000 ioremap
> 0xffffe3ba44802000-0xffffe3ba44804000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe4524b2a2000-0xffffe4524b2a4000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe61372b2e000-0xffffe61372b30000    8192 gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> 0xffffe704d2f7c000-0xffffe704d2f8d000   69632 pcpu_create_chunk+0x80/0x2c0 pages=16 vmalloc
> 0xffffe8ffffc00000-0xffffe8ffffe00000 2097152 pcpu_get_vm_areas+0x0/0x1a40 vmalloc
> 
> CC: Andrew Morton <akpm@...ux-foundation.org>
> CC: Andy Lutomirski <luto@...nel.org>
> CC: Jann Horn <jannh@...gle.com>
> CC: Kees Cook <keescook@...omium.org>
> CC: Linux API <linux-api@...r.kernel.org>
> CC: Matthew Wilcox <willy@...radead.org>
> CC: Mike Rapoport <rppt@...nel.org>
> Signed-off-by: Topi Miettinen <toiwoton@...il.com>
> ---
>   .../admin-guide/kernel-parameters.txt         |  2 ++
>   mm/vmalloc.c                                  | 25 +++++++++++++++++--
>   2 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 44fde25bb221..a0242e31d2d8 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4017,6 +4017,8 @@
>   
>   	ramdisk_start=	[RAM] RAM disk image start address
>   
> +	randomize_vmalloc= [KNL] Randomize vmalloc() allocations.
> +
>   	random.trust_cpu={on,off}
>   			[KNL] Enable or disable trusting the use of the
>   			CPU's random number generator (if available) to
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6ae491a8b210..a5f7bb46ddf2 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -34,6 +34,7 @@
>   #include <linux/bitops.h>
>   #include <linux/rbtree_augmented.h>
>   #include <linux/overflow.h>
> +#include <linux/random.h>
>   
>   #include <linux/uaccess.h>
>   #include <asm/tlbflush.h>
> @@ -1079,6 +1080,17 @@ adjust_va_to_fit_type(struct vmap_area *va,
>   	return 0;
>   }
>   
> +static int randomize_vmalloc = 0;
> +
> +static int __init set_randomize_vmalloc(char *str)
> +{
> +	if (!str)
> +		return 0;
> +	randomize_vmalloc = simple_strtoul(str, &str, 0);
> +	return 1;
> +}
> +__setup("randomize_vmalloc=", set_randomize_vmalloc);
> +
>   /*
>    * Returns a start address of the newly allocated area, if success.
>    * Otherwise a vend is returned that indicates failure.
> @@ -1152,7 +1164,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>   				int node, gfp_t gfp_mask)
>   {
>   	struct vmap_area *va, *pva;
> -	unsigned long addr;
> +	unsigned long addr, voffset;
>   	int purged = 0;
>   	int ret;
>   
> @@ -1207,11 +1219,20 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>   	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
>   		kmem_cache_free(vmap_area_cachep, pva);
>   
> +	/* Randomize allocation */
> +	if (randomize_vmalloc) {
> +		voffset = get_random_long() & (roundup_pow_of_two(vend - vstart) - 1);
> +		voffset = PAGE_ALIGN(voffset);
> +		if (voffset + size > vend - vstart)
> +			voffset = vend - vstart - size;
> +	} else
> +		voffset = 0;
> +
>   	/*
>   	 * If an allocation fails, the "vend" address is
>   	 * returned. Therefore trigger the overflow path.
>   	 */
> -	addr = __alloc_vmap_area(size, align, vstart, vend);
> +	addr = __alloc_vmap_area(size, align, vstart + voffset, vend);

Does not work so well after all:

Dec 02 18:25:01 kernel: systemd-udevd: vmalloc: allocation failure: 
10526720 bytes, mode:0xcc0(GFP_KERNEL), 
nodemask=(null),cpuset=/,mems_allowed=0
Dec 02 18:25:01 kernel: CPU: 12 PID: 716 Comm: systemd-udevd Tainted: G 
            E     5.10.0-rc5+ #25
Dec 02 18:25:01 kernel: Hardware name: <redacted>
Dec 02 18:25:01 kernel: Call Trace:
Dec 02 18:25:01 kernel:  dump_stack+0x7d/0xa3
Dec 02 18:25:01 kernel:  warn_alloc.cold+0x83/0x126
Dec 02 18:25:01 kernel:  ? zone_watermark_ok_safe+0x140/0x140
Dec 02 18:25:01 kernel:  ? __kasan_slab_free+0x122/0x150
Dec 02 18:25:01 kernel:  ? slab_free_freelist_hook+0x66/0x110
Dec 02 18:25:01 kernel:  ? kfree+0xba/0x3e0
Dec 02 18:25:01 kernel:  __vmalloc_node_range+0xd7/0xf0
Dec 02 18:25:01 kernel:  ? load_module+0x29e0/0x3f40
Dec 02 18:25:01 kernel:  module_alloc+0x9f/0x110
Dec 02 18:25:01 kernel:  ? load_module+0x29e0/0x3f40
Dec 02 18:25:01 kernel:  load_module+0x29e0/0x3f40
Dec 02 18:25:01 kernel:  ? ima_post_read_file+0x140/0x150
Dec 02 18:25:01 kernel:  ? module_frob_arch_sections+0x20/0x20
Dec 02 18:25:01 kernel:  ? kernel_read_file+0x1d2/0x3e0
Dec 02 18:25:01 kernel:  ? __x64_sys_fsopen+0x1f0/0x1f0
Dec 02 18:25:01 kernel:  ? up_write+0x92/0x140
Dec 02 18:25:01 kernel:  ? downgrade_write+0x160/0x160
Dec 02 18:25:01 kernel:  ? kernel_read_file_from_fd+0x4b/0x90
Dec 02 18:25:01 kernel:  __do_sys_finit_module+0x110/0x1a0
Dec 02 18:25:01 kernel:  ? __x64_sys_init_module+0x50/0x50
Dec 02 18:25:01 kernel:  ? get_nth_filter.part.0+0x160/0x160
Dec 02 18:25:01 kernel:  ? randomize_stack_top+0x70/0x70
Dec 02 18:25:01 kernel:  ? __x64_sys_fstat+0x30/0x30
Dec 02 18:25:01 kernel:  ? __audit_syscall_entry+0x16a/0x1d0
Dec 02 18:25:01 kernel:  ? ktime_get_coarse_real_ts64+0x4a/0x70
Dec 02 18:25:01 kernel:  do_syscall_64+0x33/0x40
Dec 02 18:25:01 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 02 18:25:01 kernel: RIP: 0033:0xdd0fd2fb989
Dec 02 18:25:01 kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 
44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 
24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d7 54 0c 00 f7 d8 64 
89 01 48
Dec 02 18:25:01 kernel: RSP: 002b:00000ceb4f03f028 EFLAGS: 00000246 
ORIG_RAX: 0000000000000139
Dec 02 18:25:01 kernel: RAX: ffffffffffffffda RBX: 00000ef04a12fa90 RCX: 
00000dd0fd2fb989
Dec 02 18:25:01 kernel: RDX: 0000000000000000 RSI: 000003b119220e4d RDI: 
0000000000000017
Dec 02 18:25:01 kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 
00000ef04a11b018
Dec 02 18:25:01 kernel: R10: 0000000000000017 R11: 0000000000000246 R12: 
000003b119220e4d
Dec 02 18:25:01 kernel: R13: 0000000000000000 R14: 00000ef04a124a10 R15: 
00000ef04a12fa90
Dec 02 18:25:01 kernel: Mem-Info:
Dec 02 18:25:01 kernel: active_anon:96 inactive_anon:17667 isolated_anon:0
                          active_file:15598 inactive_file:35563 
isolated_file:0
                          unevictable:0 dirty:0 writeback:0
                          slab_reclaimable:8064 slab_unreclaimable:159447
                          mapped:10434 shmem:229 pagetables:5844 bounce:0
                          free:3176890 free_pcp:2892 free_cma:0
Dec 02 18:25:01 kernel: Node 0 active_anon:384kB inactive_anon:70668kB 
active_file:62392kB inactive_file:142252kB unevictable:0kB 
isolated(anon):0kB isolated(file):0kB mapped:41736kB dirty:0kB 
writeback:0kB shmem:916kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 
0kB wri>
Dec 02 18:25:01 kernel: DMA free:13860kB min:76kB low:92kB high:108kB 
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB 
present:15996kB managed:15908kB mlocked:0kB pagetables:0kB bounce:0kB 
free_pc>
Dec 02 18:25:01 kernel: lowmem_reserve[]: 0 2650 13377 13377 13377
Dec 02 18:25:01 kernel: DMA32 free:2790432kB min:13372kB low:16712kB 
high:20052kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB 
present:2796348kB managed:2796008kB mlocked:0kB pagetables:0kB bo>
Dec 02 18:25:01 kernel: lowmem_reserve[]: 0 0 10726 10726 10726
Dec 02 18:25:01 kernel: Normal free:9903268kB min:54128kB low:67660kB 
high:81192kB reserved_highatomic:0KB active_anon:384kB 
inactive_anon:70668kB active_file:62392kB inactive_file:142252kB 
unevictable:0kB writepending:0kB present:13356288kB managed:10991672kB 
mlocked:0kB>
Dec 02 18:25:01 kernel: lowmem_reserve[]: 0 0 0 0 0
Dec 02 18:25:01 kernel: DMA: 3*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 
2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 2*2048kB (UM) 
2*4096kB (M) = 13860kB
Dec 02 18:25:01 kernel: DMA32: 12*4kB (UM) 10*8kB (UM) 8*16kB (M) 9*32kB 
(M) 8*64kB (UM) 6*128kB (UM) 7*256kB (UM) 9*512kB (UM) 5*1024kB (UM) 
6*2048kB (M) 675*4096kB (M) = 2790432kB
Dec 02 18:25:01 kernel: Normal: 82*4kB (UE) 1*8kB (E) 3*16kB (UME) 
16*32kB (UM) 1*64kB (U) 1*128kB (U) 1*256kB (M) 6*512kB (UM) 8*1024kB 
(UME) 1*2048kB (E) 2414*4096kB (M) = 9902400kB

I suppose the random address happened to be too near 'vend' and no 
suitable block was found. Perhaps the search in __alloc_vmap_area() 
should then continue at 'vstart' instead (so __alloc_vmap_area() would 
be passed all three of vstart, voffset, vend instead of just 
vstart+voffset, vend).

This also seems to randomize module addresses. I was going to check that 
next, so nice surprise!

-Topi

>   	spin_unlock(&free_vmap_area_lock);
>   
>   	if (unlikely(addr == vend))
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ