linux-kernel - RE: [PATCH (v2 - fixup)] x86: Have debug/nmi restore the IDT it replaced

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <A5ED84D3BB3A384992CBB9C77DEDA4D41B00EA43@USINDEM103.corp.hds.com>
Date:	Mon, 3 Jun 2013 19:44:44 +0000
From:	Seiji Aguchi <seiji.aguchi@....com>
To:	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...nel.org>
CC:	"H. Peter Anvin" <hpa@...ux.intel.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Borislav Petkov (bp@...en8.de)" <bp@...en8.de>
Subject: RE: [PATCH (v2 - fixup)] x86: Have debug/nmi restore the IDT it
 replaced

Steven,

I think we can solve this problem simply by checking "current IDT"
rather than using a  holding/restoring way.

Please see a patch set below.
	[PATCH v13 0/3]trace,x86: irq vector tracepoint support
	[PATCH v13 1/3] tracing: Add DEFINE_EVENT_FN() macro
	[PATCH v13 2/3] trace,x86: Introduce entering/exiting_irq()
	[PATCH v13 3/3] trace,x86: Add irq vector tracepoints

Seiji

> -----Original Message-----
> From: Steven Rostedt [mailto:rostedt@...dmis.org]
> Sent: Tuesday, May 28, 2013 9:32 AM
> To: Ingo Molnar
> Cc: H. Peter Anvin; LKML; Thomas Gleixner; Frederic Weisbecker; Andrew Morton; Seiji Aguchi; Borislav Petkov (bp@...en8.de)
> Subject: [PATCH (v2 - fixup)] x86: Have debug/nmi restore the IDT it replaced
> 
> I've been playing with Seiji's patches:
> 
> https://lkml.org/lkml/2013/1/21/573
> 
> Which add tracing to the x86 interrupt vectors. But to keep tracing
> overhead down to zero when tracing is inactive, it uses a swap of the
> IDT to replace the irq vectors with versions that have tracepoints in
> them, when tracing is enabled. Even though tracepoints have "nops", we
> do this to keep even that overhead eliminated.
> 
> These seemed to work, until I ran the following command with trace-cmd:
> 
>  trace-cmd start -p function_graph -g "smp_trace_apic_timer_interrupt" \
>    -g "smp_apic_timer_interrupt" -e irq_vectors
> 
> What trace-cmd does above, is not only enables the irq_vectors, but also
> produces the call graphs of the two interrupt vector functions:
> smp_trace_apic_timer_interrupt and smp_apic_timer_interrupt
> 
> The result was rather surprising. It showed only
> smp_apic_timer_interrupt being called, and not the trace version.
> Obviously, the tracepoints for the vectors were also not there.
> 
> When starting without the function graph tracer, it worked fine, but I
> wanted to see the trace versions being called to be sure, which required
> one of the function tracers.
> 
> Investigating, I found the problem. It's with the NMI and breakpoint IDT
> handling. I wont explain it too much here, but see:
> 
> commit f8988175f "x86: Allow nesting of the debug stack IDT setting"
> commit 42181186a "x86: Add counter when debug stack is used with
> interrupts enabled"
> commit 228bdaa95 "x86: Keep current stack in NMI breakpoints"
> 
> The basic gist of the above commits is that on NMI or DEBUG trap
> entering, it needs to modify the IDT so that the stack pointer doesn't
> get reset to the top of the stack again. On boot up, two IDT tables are
> created. One for normal operations and one for this NMI/DEBUG case.
> 
> The issue is that it just toggles between the two IDT tables. But if
> this happens when Seiji's swap had already happened, it replaces the
> trace IDT table back to the normal table, and tracing stops, which sorta
> defeats the purpose.
> 
> To solve this, I've added a 'hold_idt_descr' per cpu variable that
> stores the IDT that was loaded and will use that to replace it. If the
> DEBUG/NMI came in when the IDT was normal, it would replace it back with
> the normal IDT, and if it came in when it was the trace IDT, it would
> put back the trace IDT.
> 
> I've run a few tests so far on this code, but I need to run more
> stressful ones. Meanwhile, until I find any bugs, I'm posting this patch
> for your enjoyment. I think I got all the cases, as NMIs causes the
> store/restore functions to be re-entrent without any locks.
> 
> Signed-off-by: Steven Rostedt <rostedt@...dmis.org>
> ---
>  arch/x86/kernel/cpu/common.c |   79 ++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 77 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 22018f7..1315367 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1149,10 +1149,54 @@ int is_debug_stack(unsigned long addr)
>  }
> 
>  static DEFINE_PER_CPU(u32, debug_stack_use_ctr);
> +static DEFINE_PER_CPU(struct desc_ptr, hold_idt_descr);
> +static DEFINE_PER_CPU(struct desc_ptr, copy_idt_descr);
> 
> +/*
> + * Debug traps and NMIs will swap the IDT to have the debug
> + * trap not modify the stack (nmi_idt_descr). But as both the
> + * debug and NMIs share this, they need to be re-entrant. A debug
> + * trap could be doing the swap and after it incs the debug_stack_use_ctr,
> + * an NMI could come in. It too would need to do the swap, and it would
> + * need to swap every time.
> + *
> + * But, the IDT can be changed by other users, and this must restore
> + * that as well.
> + *
> + * Luckily, as interrupts are disabled from the set_zero to reset,
> + * we do not need to worry about the IDT changing underneath us
> + * from other users.
> + */
>  void debug_stack_set_zero(void)
>  {
> +       /*
> +        * Writing to the IDT is atomic. If an NMI were to come
> +        * in after we did the compare but before the store_idt(),
> +        * it too would see the address == 0 and do the store itself.
> +        */
> +       if (this_cpu_read(hold_idt_descr.address) == 0)
> +               store_idt(this_cpu_ptr(&hold_idt_descr));
> +
> +       /*
> +        * If an NMI were to come in now, it would not set hold_idt_descr,
> +        * but on exit, it would restore the IDT because the counter is
> +        * still zero here. Then it would set the .address back to zero too.
> +        */
> +
>  	this_cpu_inc(debug_stack_use_ctr);
> +
> +       /*
> +        * NMI's coming in now are not an issue as we set the .address
> +        * and also incremented the ctr. But, if an NMI came in before
> +        * the counter was incremented, and after the .address was set,
> +        * the NMI would set the .address back to zero, and would have
> +        * restored the IDT. We need to check if that happened.
> +        * If it did, then the .address would be zero here again.
> +        */
> +       if (unlikely(this_cpu_read(hold_idt_descr.address) == 0))
> +               store_idt(this_cpu_ptr(&hold_idt_descr));
> +
> +       /* On enter, we always want to use the nmi_idt_descr */
>  	load_idt((const struct desc_ptr *)&nmi_idt_descr);
>  }
> 
> @@ -1160,8 +1204,39 @@ void debug_stack_reset(void)
>  {
>  	if (WARN_ON(!this_cpu_read(debug_stack_use_ctr)))
>  		return;
> -	if (this_cpu_dec_return(debug_stack_use_ctr) == 0)
> -		load_idt((const struct desc_ptr *)&idt_descr);
> +
> +       if (WARN_ON(!this_cpu_read(hold_idt_descr.address)))
> +               return;
> +
> +       /*
> +        * This is the tricky part. We need to restore the old
> +        * IDT to what it was before we entered, but an NMI could
> +        * come in at any point, and do the same.
> +        *
> +        * If the count is 1, then we are the first to enter and
> +        * we need to update the IDT. But, we must do that after
> +        * we decrement the counter, in which case, if an NMI
> +        * comes in, it too will see the 1. To get around this,
> +        * we update a copy first.
> +        *
> +        * The copy will always contain what we want to load the
> +        * IDT with.
> +        */
> +       if (this_cpu_read(debug_stack_use_ctr) == 1)
> +               *this_cpu_ptr(&copy_idt_descr) = *this_cpu_ptr(&hold_idt_descr);
> +
> +       if (this_cpu_dec_return(debug_stack_use_ctr) == 0) {
> +               /*
> +                * We use the copy to save to the IDT, as it will always contain
> +                * what we want to restore the IDT to.
> +                */
> +               load_idt(this_cpu_ptr(&copy_idt_descr));
> +               /*
> +                * Now clear out the hold_idt_descr.address, to let all new
> +                * users restore it from the IDT.
> +                */
> +               this_cpu_write(hold_idt_descr.address, 0);
> +       }
>  }
> 
>  #else	/* CONFIG_X86_64 */
> --
> 1.7.3.4
> 
>