lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 6 May 2016 11:16:50 -0700
From:	Kees Cook <keescook@...omium.org>
To:	Borislav Petkov <bp@...en8.de>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andy Lutomirski <luto@...nel.org>,
	Dave Young <dyoung@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andy Lutomirski <luto@...capital.net>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Brian Gerst <brgerst@...il.com>,
	Yinghai Lu <yinghai@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Vivek Goyal <vgoyal@...hat.com>, Baoquan He <bhe@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	"linux-tip-commits@...r.kernel.org" 
	<linux-tip-commits@...r.kernel.org>
Subject: Re: [tip:x86/boot] x86/KASLR: Consolidate mem_avoid[] entries

On Fri, May 6, 2016 at 9:08 AM, Borislav Petkov <bp@...en8.de> wrote:
> On Fri, May 06, 2016 at 12:46:05AM -0700, tip-bot for Yinghai Lu wrote:
>> Commit-ID:  9dc1969c24eff8b7d7a9a565d1047b624921ba06
>> Gitweb:     http://git.kernel.org/tip/9dc1969c24eff8b7d7a9a565d1047b624921ba06
>> Author:     Yinghai Lu <yinghai@...nel.org>
>> AuthorDate: Thu, 5 May 2016 15:13:47 -0700
>> Committer:  Ingo Molnar <mingo@...nel.org>
>> CommitDate: Fri, 6 May 2016 09:00:59 +0200
>>
>> x86/KASLR: Consolidate mem_avoid[] entries
>>
>> The mem_avoid[] array is used to track positions that should be avoided (like
>> the compressed kernel, decompression code, etc) when selecting a memory
>> position for the randomly relocated kernel. Since ZO is now at the end of
>
> Can we stop with those ZO(MG) obfuscated abbreviations and call it
> simply the "compressed kernel image"?

I can expand them in the change logs, but it helps to keep reinforcing
their names since all the variables are named using these.

>> the decompression buffer and the decompression code (and its heap and
>> stack) are at the front, we can safely consolidate the decompression entry,
>> the heap entry, and the stack entry. The boot_params memory, however, could
>> be elsewhere, so it should be explicitly included.
>>
>> Signed-off-by: Yinghai Lu <yinghai@...nel.org>
>> Signed-off-by: Baoquan He <bhe@...hat.com>
>> [ Rwrote changelog, cleaned up code comments. ]
>> Signed-off-by: Kees Cook <keescook@...omium.org>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Andy Lutomirski <luto@...capital.net>
>> Cc: Andy Lutomirski <luto@...nel.org>
>> Cc: Borislav Petkov <bp@...en8.de>
>> Cc: Brian Gerst <brgerst@...il.com>
>> Cc: Dave Young <dyoung@...hat.com>
>> Cc: Denys Vlasenko <dvlasenk@...hat.com>
>> Cc: H. Peter Anvin <hpa@...or.com>
>> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
>> Cc: Peter Zijlstra <peterz@...radead.org>
>> Cc: Thomas Gleixner <tglx@...utronix.de>
>> Cc: Vivek Goyal <vgoyal@...hat.com>
>> Cc: kernel-hardening@...ts.openwall.com
>> Cc: lasse.collin@...aani.org
>> Link: http://lkml.kernel.org/r/1462486436-3707-3-git-send-email-keescook@chromium.org
>> Signed-off-by: Ingo Molnar <mingo@...nel.org>
>> ---
>>  arch/x86/boot/compressed/kaslr.c | 77 +++++++++++++++++++++++++++++++---------
>>  1 file changed, 61 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
>> index 2072d82..6392f00 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -121,7 +121,7 @@ struct mem_vector {
>>       unsigned long size;
>>  };
>>
>> -#define MEM_AVOID_MAX 5
>> +#define MEM_AVOID_MAX 4
>>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>>
>>  static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
>> @@ -146,22 +146,71 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>>       return true;
>>  }
>>
>> +/*
>> + * In theroy, KASLR can put the kernel anywhere in area of [16M, 64T). The
>          ^^^^^^
> Spellchecker please.

Gah, thanks. No matter how many times I run through these, I keep
adding more typos. :)

>
>                                                    in the range of [...
>
>> + * mem_avoid array is used to store the ranges that need to be avoided when
>> + * KASLR searches for a an appropriate random address. We must avoid any
>
>                         s/a //
>
>> + * regions that are unsafe to overlap with during decompression, and other
>> + * things like the initrd, cmdline and boot_params.
>> + *
>> + * How to calculate the unsafe areas is detailed here, and is informed by
>> + * the decompression calculations in header.S, and the diagram in misc.c.
>> + *
>> + * The compressed vmlinux (ZO) plus relocs and the run space of ZO can't be
>> + * overwritten by decompression output.
>> + *
>> + * ZO sits against the end of the decompression buffer, so we can calculate
>> + * where text, data, bss, etc of ZO are positioned.
>> + *
>> + * The follow are already enforced by the code:
>
>         " ... following conditions are ..."
>
>> + *  - init_size >= kernel_total_size
>> + *  - input + input_len >= output + output_len
>> + *  - kernel_total_size could be >= or < output_len
>> + *
>> + * From this, we can make several observations, illustrated by a diagram:
>> + *  - init_size >= kernel_total_size
>
> You just said this above?!
>
>> + *  - input + input_len > output + output_len
>> + *  - kernel_total_size >= output_len
>
> Ok, what is the point of the observations? To show that we're making the
> conditions enforced by the code more concrete, stricter, ...?

This was an earlier attempt by Baoquan to fully explain the reasoning
in this code since I couldn't understand it. He added the specific
conditions, observations, and added the diagram. The goal is to prove
that the changes to mem_avoid are safe since mistakes here lead to
really hard to debug bugs.

>> + *
>> + * 0   output            input            input+input_len    output+init_size
>> + * |     |                 |                       |                       |
>> + * |     |                 |                       |                       |
>> + * |-----|--------|--------|------------------|----|------------|----------|
>> + *                |                           |                 |
>> + *                |                           |                 |
>> + * output+init_size-ZO_INIT_SIZE   output+output_len  output+kernel_total_size
>> + *
>> + * [output, output+init_size) is for the buffer for decompressing the
>> + * compressed kernel (ZO).
>> + *
>> + * [output, output+kernel_total_size) is for the uncompressed kernel (VO)
>> + * and its bss, brk, etc.
>> + * [output, output+output_len) is VO plus relocs
>> + *
>> + * [output+init_size-ZO_INIT_SIZE, output+init_size) is the copied ZO.
>
> ..., i.e., ZO_INIT_SIZE.

Well, no, these are ranges, so literally what it says.
"output+init_size-ZO_INIT_SIZE" is the start of the compressed image
(ZO). It's position is now found from the end of the buffer, which is
output+init_size (VO's position plus VO's total run size) minus the
total run size of ZO.

>
>> + * [input, input+input_len) is the copied compressed (VO (vmlinux after
>> + * objcopy) plus relocs), not the ZO.
>> + *
>> + * [input+input_len, output+init_size) is [_text, _end) for ZO. That was the
>> + * first range in mem_avoid, which included ZO's heap and stack. Also
>> + * [input, input+input_size) need be put in mem_avoid array, but since it
>
>                         " ... needs to be in the mem_avoid ..."
>
>> + * is adjacent to the first entry, they can be merged. This is how we get
>> + * the first entry in mem_avoid[].
>> + */
>
> <--- my head smokes right about here. :-)

Heh. Yeah, and this is LESS confusing than when the ZO wasn't aligned
to the end of the buffer. A whole other set of conditions vanish now.
I will try to further explain these.

>>  static void mem_avoid_init(unsigned long input, unsigned long input_size,
>> -                        unsigned long output, unsigned long output_size)
>> +                        unsigned long output)
>>  {
>> +     unsigned long init_size = boot_params->hdr.init_size;
>>       u64 initrd_start, initrd_size;
>>       u64 cmd_line, cmd_line_size;
>> -     unsigned long unsafe, unsafe_len;
>>       char *ptr;
>>
>>       /*
>>        * Avoid the region that is unsafe to overlap during
>> -      * decompression (see calculations in ../header.S).
>> +      * decompression.
>>        */
>> -     unsafe_len = (output_size >> 12) + 32768 + 18;
>> -     unsafe = (unsigned long)input + input_size - unsafe_len;
>> -     mem_avoid[0].start = unsafe;
>> -     mem_avoid[0].size = unsafe_len;
>> +     mem_avoid[0].start = input;
>> +     mem_avoid[0].size = (output + init_size) - input;
>
> There's a define above called MEM_AVOID_MAX but the array elements are
> adressed with naked numbers here. Perhaps if we had defines for them
> too, it'll make the code a bit more understandable:
>
>         mem_avoid[MEM_AVOID_ZO_RANGE] ...
>         mem_avoid[MEM_AVOID_INITRD] ...
>
> and so on.

Ah! Yes, excellent. I'll actually use an enum so I can get
MEM_AVOID_MAX automatically.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ