linux-kernel - Re: [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220416004946.tydhjaewitocy2cn@treble>
Date:   Fri, 15 Apr 2022 17:49:46 -0700
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Dmitry Monakhov <dmtrmonakhov@...dex-team.ru>,
        linux-kernel@...r.kernel.org, x86@...nel.org, mingo@...hat.com,
        kim.phillips@....com
Subject: Re: [PATCH 1/2] x86/unwind/orc: recheck address range after stack
 info was updated

On Tue, Apr 12, 2022 at 12:08:37PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 12, 2022 at 10:40:03AM +0300, Dmitry Monakhov wrote:
> > get_stack_info() detects stack type only by begin address, so we must
> > check that address range in question is fully covered by detected stack
> > 
> > Otherwise following crash is possible:
> > -> unwind_next_frame
> >    case ORC_TYPE_REGS:
> >      if (!deref_stack_regs(state, sp, &state->ip, &state->sp))
> >      -> deref_stack_regs
> >        -> stack_access_ok  <- here addr is inside stack range, but addr+len-1 is not, but we still exit with success
> >      *ip = READ_ONCE_NOCHECK(regs->ip); <- Here we hit stack guard fault
> > OOPS LOG:
> > <0>[ 1941.845743] BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0)
> 
> 
> > <4>[ 1941.845751]  get_perf_callchain+0x10d/0x280
> > <4>[ 1941.845751]  perf_callchain+0x6e/0x80
> > <4>[ 1941.845752]  perf_prepare_sample+0x87/0x540
> > <4>[ 1941.845752]  perf_event_output_forward+0x31/0x90
> > <4>[ 1941.845753]  __perf_event_overflow+0x5a/0xf0
> > <4>[ 1941.845754]  perf_ibs_handle_irq+0x340/0x5b0
> > <4>[ 1941.845757]  perf_ibs_nmi_handler+0x34/0x60
> > <4>[ 1941.845757]  nmi_handle+0x79/0x190
> 
> Urgh, this is another instance of trying to unwind an IP that no longer
> matches the stack.
> 
> Fixing the unwinder bug is good, but arguable we should also fix this
> IBS stuff, see 6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")

I remember that nastiness well.  So it's still broken?  Or is this a
regression?  Maybe we wouldn't notice it except for this triggered
unwinder bug?

-- 
Josh