lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 1 Aug 2018 16:16:53 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Dave Hansen <dave.hansen@...ux.intel.com>
cc:     linux-kernel@...r.kernel.org, keescook@...gle.com,
        tglx@...utronix.de, mingo@...nel.org, aarcange@...hat.com,
        jgross@...e.com, jpoimboe@...hat.com, gregkh@...uxfoundation.org,
        peterz@...radead.org, hughd@...gle.com,
        torvalds@...ux-foundation.org, bp@...en8.de, luto@...nel.org,
        ak@...ux.intel.com
Subject: Re: [PATCH 5/5] x86/mm/init: remove freed kernel image areas from
 alias mapping

On Wed, 1 Aug 2018, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@...ux.intel.com>
> 
> The kernel image is mapped into two places in the virtual address
> space (addresses without KASLR, of course):
> 
> 	1. The kernel direct map (0xffff880000000000)
> 	2. The "high kernel map" (0xffffffff81000000)
> 
> We actually execute out of #2.  If we get the address of a kernel
> symbol, it points to #2, but almost all physical-to-virtual
> translations point to #1.
> 
> Parts of the "high kernel map" alias are mapped in the userspace
> page tables with the Global bit for performance reasons.  The
> parts that we map to userspace do not (er, should not) have
> secrets.
> 
> This is fine, except that some areas in the kernel image that
> are adjacent to the non-secret-containing areas are unused holes.
> We free these holes back into the normal page allocator and
> reuse them as normal kernel memory.  The memory will, of course,
> get *used* via the normal map, but the alias mapping is kept.
> 
> This otherwise unused alias mapping of the holes will, by default
> keep the Global bit, be mapped out to userspace, and be
> vulnerable to Meltdown.
> 
> Remove the alias mapping of these pages entirely.  This is likely
> to fracture the 2M page mapping the kernel image near these areas,
> but this should affect a minority of the area.
> 
> This unmapping behavior is currently dependent on PTI being in
> place.  Going forward, we should at least consider doing this for
> all configurations.  Having an extra read-write alias for memory
> is not exactly ideal for debugging things like random memory
> corruption and this does undercut features like DEBUG_PAGEALLOC
> or future work like eXclusive Page Frame Ownership (XPFO).
> 
> Before this patch:
> 
> current_kernel:---[ High Kernel Mapping ]---
> current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
> current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
> current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
> current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
> current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
> current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
> current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
> current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
> current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd
> 
>   current_user:---[ High Kernel Mapping ]---
>   current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
>   current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
>   current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
>   current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
>   current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
>   current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd
> 
> 
> After this patch:
> 
> current_kernel:---[ High Kernel Mapping ]---
> current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
> current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
> current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
> current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
> current_kernel-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
> current_kernel-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
> current_kernel-0xffffffff82488000-0xffffffff82600000        1504K                               pte
> current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
> current_kernel-0xffffffff82c00000-0xffffffff82c0d000          52K     RW                     NX pte
> current_kernel-0xffffffff82c0d000-0xffffffff82dc0000        1740K                               pte
> 
>   current_user:---[ High Kernel Mapping ]---
>   current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
>   current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
>   current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
>   current_user-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
>   current_user-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
>   current_user-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
>   current_user-0xffffffff82488000-0xffffffff82600000        1504K                               pte
>   current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd
> 
> Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Cc: Kees Cook <keescook@...gle.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Ingo Molnar <mingo@...nel.org>
> Cc: Andrea Arcangeli <aarcange@...hat.com>
> Cc: Juergen Gross <jgross@...e.com>
> Cc: Josh Poimboeuf <jpoimboe@...hat.com>
> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Hugh Dickins <hughd@...gle.com>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Borislav Petkov <bp@...en8.de>
> Cc: Andy Lutomirski <luto@...nel.org>
> Cc: Andi Kleen <ak@...ux.intel.com>
> ---
> 
>  b/arch/x86/mm/init.c |   22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff -puN arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image arch/x86/mm/init.c
> --- a/arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image	2018-07-30 09:53:14.862915689 -0700
> +++ b/arch/x86/mm/init.c	2018-07-30 09:53:14.866915689 -0700
> @@ -778,8 +778,26 @@ void free_init_pages(char *what, unsigne
>   */
>  void free_kernel_image_pages(void *begin, void *end)
>  {
> -	free_init_pages("unused kernel image",
> -			(unsigned long)begin, (unsigned long)end);
> +	unsigned long begin_ul = (unsigned long)begin;
> +	unsigned long end_ul = (unsigned long)end;
> +	unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT;
> +
> +
> +	free_init_pages("unused kernel image", begin_ul, end_ul);
> +
> +	/*
> +	 * PTI maps some of the kernel into userspace.  For
> +	 * performance, this includes some kernel areas that
> +	 * do not contain secrets.  Those areas might be
> +	 * adjacent to the parts of the kernel image being
> +	 * freed, which may contain secrets.  Remove the
> +	 * "high kernel image mapping" for these freed areas,
> +	 * ensuring they are not even potentially vulnerable
> +	 * to Meltdown regardless of the specific optimizations
> +	 * PTI is currently using.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_PTI))
> +		set_memory_np(begin_ul, len_pages);
>  }
>  
>  void __ref free_initmem(void)
> _

Ironically, that set_memory_np() is giving me a problem.

I don't see it when booting the 8GB laptop normally, but when booting
with "mem=1G", I get a not-present fault when ext4_iget() is trying to
do its business in starting init.  But boots fine with "mem=1G nopti".

I get the feeling that set_memory_np() is marking those freed pages
as not-present in the direct map, so they're no longer usable at all.

I can jot down some console messages if you need, but hope I've said
enough for you to see it immediately, and just say whoops, forget 5/5?

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ