linux-kernel - Re: [PATCH] x86: Don't clobber top of pt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAKUOC8V_bdrcP5zstq8Pb-71j5AenBfPHk=233bb7ZP_R654_w@mail.gmail.com>
Date:	Mon, 24 Sep 2012 09:55:35 -0700
From:	Salman Qazi <sqazi@...gle.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...ux.intel.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86: Don't clobber top of pt_regs in nested NMI

On Mon, Sep 24, 2012 at 6:27 AM, Steven Rostedt <rostedt@...dmis.org> wrote:
> On Tue, Sep 18, 2012 at 06:29:35PM -0700, Salman Qazi wrote:
>> The nested NMI modifies the place (instruction, flags and stack)
>> that the first NMI will iret to.  However, the copy of registers
>> modified is exactly the one that is the part of pt_regs in
>> the first NMI.  This can change the behaviour of the first NMI.
>>
>> In particular, Google's arch_trigger_all_cpu_backtrace handler
>> also prints regions of memory surrounding addresses appearing in
>> registers.  This results in handled exceptions, after which nested NMIs
>> start coming in.  These nested NMIs change the value of registers
>> in pt_regs.  This can cause the original NMI handler to produce
>> incorrect output.
>
> Hmm, interesting problem.
>
>>
>> We solve this problem by introducing an extra copy of the iret
>> registers that are exclusively a part of pt_regs and are not modified
>> elsewhere.
>
> Yuck, 4 copies of the stack frame?
>
>> The downside is that the do_nmi function can no longer
>> change the control flow, as any values it writes to these five
>> registers will be discarded.
>
> I consider this a feature.
>
>>
>> Signed-off-by: Salman Qazi <sqazi@...gle.com>
>> ---
>>  arch/x86/kernel/entry_64.S |   20 +++++++++++++++++++-
>>  1 files changed, 19 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index 69babd8..40ddb6d 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -1724,6 +1724,18 @@ repeat_nmi:
>>  end_repeat_nmi:
>>
>>       /*
>> +      * We went a running NMI handling routine to have a consistent
>> +      * picture of register state.  This should hold true even if
>> +      * there is a nested NMI.  Therefore, we let the nested NMI
>> +      * play with the previous copy of these registers and leave this
>> +      * new copy unmodified for do_nmi()
>> +      */
>> +     .rept 5
>> +     pushq_cfi 4*8(%rsp)
>> +     .endr
>> +     CFI_DEF_CFA_OFFSET SS+8-RIP
>
> Hmm, another solution that can be done without an extra copy, is to swap
> the return stack frame with the copy stack frame. This way, the copy is
> seen by the pt_regs and will always be correct. The end would basically
> be the same as you have below, just skip the copy and return.
>
> Now this breaks the idea that anything below the sp pointer is not safe
> to use. But this is the NMI stack in the controlled part of the NMI
> Handler (no breakpoints allowed here). The NMI stack is special, which
> is why we have all this crap in the first place.
>
> It would be safe to save the copy stack below the stack pointer because
> the only thing that could possible interrupt us here is another NMI,
> which would just reset the stack pointer to the top again, and notice
> that this is a nested NMI and return after modifying the return stack.
> Which is OK.
>
> At least this way, we avoid copying the stack frame a third time (two
> was enough).

Yes.  This is better.  For some reason, I thought about this and
decided against it.  But, I can't remember what crossed my mind.  I'll
try this.

>
> -- Steve
>
>
>> +
>> +     /*
>>        * Everything below this point can be preempted by a nested
>>        * NMI if the first NMI took an exception and reset our iret stack
>>        * so that we repeat another NMI.
>> @@ -1771,7 +1783,13 @@ nmi_swapgs:
>>  nmi_restore:
>>       RESTORE_ALL 8
>>       /* Clear the NMI executing stack variable */
>> -     movq $0, 10*8(%rsp)
>> +     movq $0, 15*8(%rsp)
>> +
>> +     /* Pop the extra copy of iret context that was saved above
>> +      * just for do_nmi()
>> +      */
>> +     addq $5*8, %rsp
>> +
>>       jmp irq_return
>>       CFI_ENDPROC
>>  END(nmi)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/