linux-kernel - Re: v4.10: kernel stack frame pointer .. has bad value (null)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170222225614.4z4z24uz6l2iz6qm@treble>
Date:   Wed, 22 Feb 2017 16:56:14 -0600
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Pavel Machek <pavel@....cz>
Cc:     kernel list <linux-kernel@...r.kernel.org>, mingo@...nel.org,
        luto@...nel.org, bp@...en8.de, brgerst@...il.com,
        dvlasenk@...hat.com, hpa@...or.com, torvalds@...ux-foundation.org,
        peterz@...radead.org, tglx@...utronix.de
Subject: Re: v4.10: kernel stack frame pointer .. has bad value (null)

On Wed, Feb 22, 2017 at 11:47:55PM +0100, Pavel Machek wrote:
> Hi!
> 
> > > > > Thinkpad X220, in 32 bit mode... and I'm getting rather scary messages
> > > > > from kernel during boot:
> > > > > 
> > > > > Git blame says that message comes from commit
> > > > > 
> > > > > commit 24d86f59093b0bcb3756cdf47f2db10ff4e90dbb
> > > > > Author: Josh Poimboeuf <jpoimboe@...hat.com>
> > > > > Date:   Thu Oct 27 08:10:58 2016 -0500
> > > > > 
> > > > >     x86/unwind: Ensure stack grows down
> > > > > 
> > > > >     Add a sanity check to ensure the stack only grows down, and print
> > > > >     a
> > > > >         warning if the check fails.
> > > > > 
> > > > > Any ideas?
> > > > 
> > > > I don't think I've seen this one.  Any chance this came after resuming
> > > > from a hibernation or suspend?
> > > 
> > > No, it was during the boot. Notice the timestamps...
> > 
> > Right, but doesn't waking from hibernation initially start with a
> > timestamp of zero?
> 
> Aha, ok, I guess so. Anyway... no hibernation was involved.
> 
> > The reason I asked is because of the following part of the stack
> > dump:
> 
> > 
> > > > > [    1.048429] f50cdf9c: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > > > > [    1.048429] f50cdfa0: 0000000000200002 (0x200002)
> > > > > [    1.048430] f50cdfa4: 0000000000000000 ...
> > > > > [    1.048432] f50cdfa8: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > > > > [    1.048432] f50cdfac: 0000000000000000 ...
> > > > > [    1.048433] f50cdff4: 0000000000000100 (0x100)
> > > > > [    1.048434] f50cdff8: 0000000000000200 (0x200)
> > > > > [    1.048435] f50cdffc: 0000000000000000 ...
> > > > > [    1.060368] [drm] Supports vblank timestamp caching Rev 2
> > 
> > Somehow, startup_32_smp() is on the stack twice.  The stack unwind led
> > to the startup_32_smp() frame at 0xf50cdf9c rather than the one at
> > 0xf50cdfa8 (which is where it should normally be).  So the question is
> > how startup_32_smp() got executed the second time, with the wrong stack
> > offset.
> 
> Not much idea... but this is stack dump, right? Just because some
> value is on the stack does not mean it is a return address, no?

Right, but the one at 0xf50cdfa8 is where the startup_32_smp() is
*supposed* to be.  If the unwinder had unwinded to that one, it wouldn't
have complained.  So it looks to me like the CPU somehow booted twice:
the first time at the right stack address, and the second time it
somehow ended up with a different stack address.

> And .... startup_32_smp is kind of "interesting" function. Take a
> look...

Yes, it's used in bringing up the CPU.

-- 
Josh