linux-kernel - Re: [BUG -tip] kmemleak and stacktrace cause page faul

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1910231533180.2308@nanos.tec.linutronix.de>
Date:   Wed, 23 Oct 2019 15:47:57 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Cyrill Gorcunov <gorcunov@...il.com>
cc:     LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>, linux-mm@...ck.org,
        Catalin Marinas <catalin.marinas@....com>
Subject: Re: [BUG -tip] kmemleak and stacktrace cause page faul

On Wed, 23 Oct 2019, Thomas Gleixner wrote:
> On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> Ergo ep must be a valid pointer pointing to the statically allocated and
> statically initialized estack_pages array.
> 
>         /* Guard page? */
>         if (!ep->size)
> 
> How on earth can dereferencing ep crash the machine?
> 
>                 return false;
> 
> That does not make any sense.
> 
> Surely, we should not even try to decode exception stack when
> cea_exception_stacks is not yet initialized, but that does not explain
> anything what you are observing.

So looking at your actual crash:

[    0.027246] BUG: unable to handle page fault for address: 0000000000001ff0

So this derefences the stack pointer address.

[    0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep
0xffffffff82014888

ep is pointing correctly to estack_pages[1] which is bogus because 0x1010
is not a valid stack value, but dereferencing ep does not make it crash.

The crash farther down:

    	end = begin + (unsigned long)ep->size;

==> end = 0x2000

        regs = (struct pt_regs *)end - 1;

==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

        info->type      = ep->type;
        info->begin     = (unsigned long *)begin;
        info->end       = (unsigned long *)end;

---->	info->next_sp   = (unsigned long *)regs->sp;

	This is the crashing instruction trying to access 0x1ff0

And you are right this happens because cea_exception_stacks is not yet
initialized which makes begin = 0 and therefore point into nirvana.

So the fix is trivial.

Thanks,

	tglx

8<------------
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	/*
+	 * Handle the case where stack trace is collected _before_
+	 * cea_exception_stacks had been initialized.
+	 */
+	if (!begin)
+		return false;
+
 	end = begin + sizeof(struct cea_exception_stacks);
 	/* Bail if @stack is outside the exception stack area. */
 	if (stk < begin || stk >= end)