linux-kernel - Re: [tip:x86/debug] x86, reboot: Use NMI instead of REBOOT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAE9FiQWX_2bV1phLCUrEpiPSfo24xWEv3Y+DZnwmi9JJgJHbmQ@mail.gmail.com>
Date:	Tue, 20 Dec 2011 14:38:39 -0800
From:	Yinghai Lu <yinghai@...nel.org>
To:	mingo@...hat.com, hpa@...or.com, linux-kernel@...r.kernel.org,
	andi@...stfloor.org, torvalds@...ux-foundation.org,
	peterz@...radead.org, robert.richter@....com, tglx@...utronix.de,
	mingo@...e.hu, dzickus@...hat.com
Cc:	linux-tip-commits@...r.kernel.org
Subject: Re: [tip:x86/debug] x86, reboot: Use NMI instead of REBOOT_VECTOR to
 stop cpus

On Mon, Dec 5, 2011 at 5:21 AM, tip-bot for Don Zickus
<dzickus@...hat.com> wrote:
> Commit-ID:  3603a2512f9e69dc87914ba922eb4a0812b21cd6
> Gitweb:     http://git.kernel.org/tip/3603a2512f9e69dc87914ba922eb4a0812b21cd6
> Author:     Don Zickus <dzickus@...hat.com>
> AuthorDate: Thu, 13 Oct 2011 15:14:25 -0400
> Committer:  Ingo Molnar <mingo@...e.hu>
> CommitDate: Mon, 5 Dec 2011 12:00:14 +0100
>
> x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus
>
> A recent discussion started talking about the locking on the
> pstore fs and how it relates to the kmsg infrastructure.  We
> noticed it was possible for userspace to r/w to the pstore fs
> (grabbing the locks in the process) and block the panic path
> from r/w to the same fs.
>
> The reason was the cpu with the lock could be doing work while
> the crashing cpu is panic'ing.  Busting those spinlocks might
> cause those cpus to step on each other's data.  Fine, fair
> enough.
>
> It was suggested it would be nice to serialize the panic path
> (ie stop the other cpus) and have only one cpu running.  This
> would allow us to bust the spinlocks and not worry about another
> cpu stepping on the data.
>
> Of course, smp_send_stop() does this in the panic case.
> kmsg_dump() would have to be moved to be called after it.  Easy
> enough.
>
> The only problem is on x86 the smp_send_stop() function calls
> the REBOOT_VECTOR.  Any cpu with irqs disabled (which pstore and
> its backend ERST would do), block this IPI and thus do not stop.
>  This makes it difficult to reliably log data to the pstore fs.
>
> The patch below switches from the REBOOT_VECTOR to NMI (and
> mimics what kdump does).  Switching to NMI allows us to deliver
> the IPI when irqs are disabled, increasing the reliability of
> this function.
>
> However, Andi carefully noted that on some machines this
> approach does not work because of broken BIOSes or whatever.
>
> To help accomodate this, the next couple of patches will run a
> selftest and provide a knob to disable.
>
> V2:
>  uses atomic ops to serialize the cpu that shuts everyone down
> V3:
>  comment cleanup
>
> Signed-off-by: Don Zickus <dzickus@...hat.com>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Robert Richter <robert.richter@....com>
> Cc: seiji.aguchi@....com
> Cc: vgoyal@...hat.com
> Cc: mjg@...hat.com
> Cc: tony.luck@...el.com
> Cc: gong.chen@...el.com
> Cc: satoru.moriya@....com
> Cc: avi@...hat.com
> Cc: Andi Kleen <andi@...stfloor.org>
> Link: http://lkml.kernel.org/r/1318533267-18880-2-git-send-email-dzickus@redhat.com
> Signed-off-by: Ingo Molnar <mingo@...e.hu>
> ---
>  arch/x86/kernel/smp.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 16204dc..e72b175 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -29,6 +29,7 @@
>  #include <asm/mmu_context.h>
>  #include <asm/proto.h>
>  #include <asm/apic.h>
> +#include <asm/nmi.h>
>  /*
>  *     Some notes on x86 processor bugs affecting SMP operation:
>  *
> @@ -148,6 +149,60 @@ void native_send_call_func_ipi(const struct cpumask *mask)
>        free_cpumask_var(allbutself);
>  }
>
> +static atomic_t stopping_cpu = ATOMIC_INIT(-1);
> +
> +static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
> +{
> +       /* We are registered on stopping cpu too, avoid spurious NMI */
> +       if (raw_smp_processor_id() == atomic_read(&stopping_cpu))
> +               return NMI_HANDLED;
> +
> +       stop_this_cpu(NULL);
> +
> +       return NMI_HANDLED;
> +}
> +
> +static void native_nmi_stop_other_cpus(int wait)
> +{
> +       unsigned long flags;
> +       unsigned long timeout;
> +
> +       if (reboot_force)
> +               return;
> +
> +       /*
> +        * Use an own vector here because smp_call_function
> +        * does lots of things not suitable in a panic situation.
> +        */
> +       if (num_online_cpus() > 1) {
> +               /* did someone beat us here? */
> +               if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id() != -1))
> +                       return;
> +
> +               if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
> +                                        NMI_FLAG_FIRST, "smp_stop"))
> +                       /* Note: we ignore failures here */
> +                       return;
> +
> +               /* sync above data before sending NMI */
> +               wmb();
> +
> +               apic->send_IPI_allbutself(NMI_VECTOR);
> +
> +               /*
> +                * Don't wait longer than a second if the caller
> +                * didn't ask us to wait.
> +                */
> +               timeout = USEC_PER_SEC;
> +               while (num_online_cpus() > 1 && (wait || timeout--))
> +                       udelay(1);
> +       }
> +
> +       local_irq_save(flags);
> +       disable_local_APIC();
> +       local_irq_restore(flags);
> +}
> +
>  /*
>  * this function calls the 'stop' function on all other CPUs in the system.
>  */
> @@ -160,7 +215,7 @@ asmlinkage void smp_reboot_interrupt(void)
>        irq_exit();
>  }
>
> -static void native_stop_other_cpus(int wait)
> +static void native_irq_stop_other_cpus(int wait)
>  {
>        unsigned long flags;
>        unsigned long timeout;
> @@ -230,7 +285,7 @@ struct smp_ops smp_ops = {
>        .smp_prepare_cpus       = native_smp_prepare_cpus,
>        .smp_cpus_done          = native_smp_cpus_done,
>
> -       .stop_other_cpus        = native_stop_other_cpus,
> +       .stop_other_cpus        = native_nmi_stop_other_cpus,
>        .smp_send_reschedule    = native_smp_send_reschedule,
>
>        .cpu_up                 = native_cpu_up,

this broke kexec on our intel nehalem, westmere and sandbridge platforms.
system get reset while try to kexec second kernel.

3603a2512f9e69dc87914ba922eb4a0812b21cd6 is the first bad commit
commit 3603a2512f9e69dc87914ba922eb4a0812b21cd6
Author: Don Zickus <dzickus@...hat.com>
Date:   Thu Oct 13 15:14:25 2011 -0400

    x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus

    A recent discussion started talking about the locking on the
    pstore fs and how it relates to the kmsg infrastructure.  We
    noticed it was possible for userspace to r/w to the pstore fs
    (grabbing the locks in the process) and block the panic path
    from r/w to the same fs.

    The reason was the cpu with the lock could be doing work while
    the crashing cpu is panic'ing.  Busting those spinlocks might
    cause those cpus to step on each other's data.  Fine, fair
    enough.

    It was suggested it would be nice to serialize the panic path
    (ie stop the other cpus) and have only one cpu running.  This
    would allow us to bust the spinlocks and not worry about another
    cpu stepping on the data.

    Of course, smp_send_stop() does this in the panic case.
    kmsg_dump() would have to be moved to be called after it.  Easy
    enough.

    The only problem is on x86 the smp_send_stop() function calls
    the REBOOT_VECTOR.  Any cpu with irqs disabled (which pstore and
    its backend ERST would do), block this IPI and thus do not stop.
     This makes it difficult to reliably log data to the pstore fs.

    The patch below switches from the REBOOT_VECTOR to NMI (and
    mimics what kdump does).  Switching to NMI allows us to deliver
    the IPI when irqs are disabled, increasing the reliability of
    this function.

    However, Andi carefully noted that on some machines this
    approach does not work because of broken BIOSes or whatever.

    To help accomodate this, the next couple of patches will run a
    selftest and provide a knob to disable.

    V2:
      uses atomic ops to serialize the cpu that shuts everyone down
    V3:
      comment cleanup

    Signed-off-by: Don Zickus <dzickus@...hat.com>
    Cc: Linus Torvalds <torvalds@...ux-foundation.org>
    Cc: Peter Zijlstra <peterz@...radead.org>
    Cc: Robert Richter <robert.richter@....com>
    Cc: seiji.aguchi@....com
    Cc: vgoyal@...hat.com
    Cc: mjg@...hat.com
    Cc: tony.luck@...el.com
    Cc: gong.chen@...el.com
    Cc: satoru.moriya@....com
    Cc: avi@...hat.com
    Cc: Andi Kleen <andi@...stfloor.org>
    Link:
http://lkml.kernel.org/r/1318533267-18880-2-git-send-email-dzickus@redhat.com
    Signed-off-by: Ingo Molnar <mingo@...e.hu>

:040000 040000 47a646e9ead83f34fb0728d88c786c875fab91dd
6f7a6d5ad8ed686199c0dfbc748ee07a0db97cc5 M      arch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/