linux-kernel - Re: [RFC][PATCH 3/3] x86: Add workaround to NMI iret woes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111209124026.GB14470@Krystal>
Date:	Fri, 9 Dec 2011 07:40:26 -0500
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Jason Baron <jbaron@...hat.com>,
	"H. Peter Anvin" <hpa@...ux.intel.com>,
	Paul Turner <pjt@...gle.com>
Subject: Re: [RFC][PATCH 3/3] x86: Add workaround to NMI iret woes

Hi Steven,

* Steven Rostedt (rostedt@...dmis.org) wrote:
> On Thu, 2011-12-08 at 14:30 -0500, Steven Rostedt wrote:
> 
> > If the first NMI hits a breakpoint and loses NMI context, and then it
> > hits another breakpoint and while processing that breakpoint we get a
> > nested NMI. When processing a breakpoint, the stack changes to the
> > breakpoint stack. If another NMI comes in here we can't rely on the
> > interrupted stack to be the NMI stack. 
> 
> As I wrote this part of the change log, I thought of another nasty
> gotcha with breakpoints in NMIs.
> 
> If you have a breakpoint in both normal context and NMI context. When
> the breakpoint is being processed, if an NMI comes in and it too
> triggers a breakpoint, this processing of the breakpoint has the same
> problem as nested NMIs. The NMI breakpoint handler will corrupt the
> stack of the breakpoint that was being processed when the NMI triggered.
> 
> I'm not sure how to handle this case. We could do something similar in
> the break point code to handle the same thing. But this just seems
> really ugly.
> 
> Anyone with any better ideas?

The nesting counters + code region address checks I proposed a few days
ago should handle this correctly. Here is a very slightly updated
version:

variables used:

cpu-local int nmi_nest_count;
cpu-local int nmi_latch;
__nmi_epilogue_begin (pointer to text)
__nmi_epilogue_end (pointer to text)
REAL_NMI_STACK: beginning of the stack used for real nmi handler
LATCHED_NMI_STACK: beginning of the stack used for latched nmi handler

int in_nmi_epilogue(void)
{
  return (instruction_pointer() >= __nmi_epilogue_begin
		&& instruction_pointer() < __nmi_epilogue_end);
}

int in_nmi(void)
{
  return nmi_nest_count > 0;
}

/* Use REAL_NMI_STACK */
real_nmi_handler: /* always running with nmis disabled */
  /*
   * We disable interrupts to ensure we don't have to deal with IRQs
   * when NMIs get re-enabled due to an iret from a fault/exception.
   */
  local_irq_disable();
  if (in_nmi_epilogue()) {
    nmi_latch = 0;
    /* set stack pointer to start of LATCHED_NMI_STACK */
    /* populate start of LATCHED_NMI_STACK with values for iret */
    goto latched_nmi_handler;
  }
  if (in_nmi()) {
     nmi_latch = 1;
     iret
  }
  nmi_nest_count++;
  /* set stack pointer to start of LATCHED_NMI_STACK */
  /* populate start of LATCHED_NMI_STACK with values for iret */
  goto latched_nmi_handler;


/* Use LATCHED_NMI_STACK */
latched_nmi_handler:	/* Can fault and reenable NMIs. */

  [ execute actual system NMI handler, including faults, int3, ... ]

  /*
   * note: test nmi_latch and iret instruction are within the epilogue
   * range to deal with latch test vs iret non-atomicity.  If a real nmi
   * nests over this range, it clears the nmi_latch flag and just
   * restarts the latched nmi handler.  No faults/exceptions/interrupts
   * are permitted in this region, except for the real NMI and MCEs
   * (TODO).
   */
__nmi_epilogue_begin:
  /*
   * here we are restarting the latched nmi handler if an nmi happened
   * while nested within the nmi nest count.
   */
  if (nmi_latch) {
    nmi_latch = 0;
    goto latched_nmi_handler;
  }
  nmi_nest_count--;
  iret  /* restores interrupts */
__nmi_epilogue_end:


Best regards,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/