[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTimTg1uqXdAJ+GyZ3_B6n2xKWdqFL96jvv18JBM+@mail.gmail.com>
Date: Mon, 27 Sep 2010 20:39:24 +0800
From: huang ying <huang.ying.caritas@...il.com>
To: Robert Richter <robert.richter@....com>
Cc: Huang Ying <ying.huang@...el.com>, Don Zickus <dzickus@...hat.com>,
Ingo Molnar <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH -v2 4/7] x86, NMI, Rewrite NMI handler
Hi, Robert,
On Mon, Sep 27, 2010 at 5:41 PM, Robert Richter <robert.richter@....com> wrote:
> On 26.09.10 20:57:03, Huang Ying wrote:
>> The original NMI handler is quite outdated in many aspects. This patch
>> try to fix it.
>>
>> The order to process the NMI sources are changed as follow:
>>
>> notify_die(DIE_NMI_IPI);
>> notify_die(DIE_NMI);
>> /* process io port 0x61 */
>> nmi_watchdog_touch();
>> notify_die(DIE_NMIUNKNOWN);
>> unknown_nmi();
>>
>> DIE_NMI_IPI is used to process CPU specific NMI sources, such as perf
>> event, oprofile, crash IPI, etc. While DIE_NMI is used to process
>> non-CPU-specific NMI sources, such as APEI (ACPI Platform Error
>> Interface) GHES (Generic Hardware Error Source), etc. Non-CPU-specific
>> NMI sources can be processed on any CPU,
>>
>> DIE_NMI_IPI must be processed before DIE_NMI. For example, perf event
>> trigger a NMI on CPU 1, at the same time, APEI GHES trigger another
>> NMI on CPU 0. If DIE_NMI is processed before DIE_NMI_IPI, it is
>> possible that APEI GHES is processed on CPU 1, while unknown NMI is
>> gotten on CPU 0.
>
> I think macro names DIE_NMI_IPI and DIE_NMI should be swapped as
> e.g. the perf nmi is actually local and non-IPI.
DIE_NMI_IPI may be not a good name for perf, but DIE_NMI is a even
worse name for perf! DIE_NMI is originally used for IOCHK and PCI SERR
NMI.
> We might consider to rework the IPI thing completly, but may be in a
> follow-on patch.
>
>>
>> In this new order of processing, performance sensitive NMI sources
>> such as oprofile or perf event will have better performance because
>> the time consuming IO port reading is done after them.
>>
>> Only one NMI is eaten for each NMI handler call, even for PCI SERR and
>> IOCHK NMIs. Because one NMI should be raised for each of them, eating
>> too many NMI will cause unnecessary unknown NMI.
>>
>> The die value used in NMI sources are fixed accordingly.
>>
>> The NMI handler in the patch is designed by Andi Kleen.
>>
>>
>> v2:
>>
>> - Split process NMI reason (0x61) on non-BSP into another patch
>>
>> Signed-off-by: Huang Ying <ying.huang@...el.com>
>> ---
>> arch/x86/kernel/cpu/perf_event.c | 1
>> arch/x86/kernel/traps.c | 80 +++++++++++++++++++-------------------
>> arch/x86/oprofile/nmi_int.c | 1
>> arch/x86/oprofile/nmi_timer_int.c | 2
>> drivers/char/ipmi/ipmi_watchdog.c | 2
>> drivers/watchdog/hpwdt.c | 2
>> 6 files changed, 43 insertions(+), 45 deletions(-)
>>
>> --- a/arch/x86/kernel/cpu/perf_event.c
>> +++ b/arch/x86/kernel/cpu/perf_event.c
>> @@ -1247,7 +1247,6 @@ perf_event_nmi_handler(struct notifier_b
>> return NOTIFY_DONE;
>>
>> switch (cmd) {
>> - case DIE_NMI:
>> case DIE_NMI_IPI:
>
> See my comment above. Same is true for oprofile and some other
> handlers below. It isn't an IPI and should be case DIE_NMI: instead.
>
>> break;
>> case DIE_NMIUNKNOWN:
>> --- a/arch/x86/kernel/traps.c
>> +++ b/arch/x86/kernel/traps.c
>> @@ -354,9 +354,6 @@ io_check_error(unsigned char reason, str
>> static notrace __kprobes void
>> unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
>> {
>> - if (notify_die(DIE_NMIUNKNOWN, "nmi", regs, reason, 2, SIGINT) ==
>> - NOTIFY_STOP)
>> - return;
>> #ifdef CONFIG_MCA
>> /*
>> * Might actually be able to figure out what the guilty party
>> @@ -385,51 +382,54 @@ static notrace __kprobes void default_do
>>
>> cpu = smp_processor_id();
>
> This should go to if (!cpu) and maybe we drop variable cpu completly.
The variable cpu is dropped in 5/7.
>>
>> - /* Only the BSP gets external NMIs from the system. */
>> - if (!cpu)
>> - reason = get_nmi_reason();
>> + /*
>> + * CPU-specific NMI must be processed before non-CPU-specific
>> + * NMI, otherwise we may lose it, because the CPU-specific
>> + * NMI can not be detected/processed on other CPUs.
>> + */
>>
>> - if (!(reason & NMI_REASON_MASK)) {
>> - if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, reason, 2, SIGINT)
>> - == NOTIFY_STOP)
>> - return;
>> + /*
>> + * CPU-specific NMI: send to specific CPU or NMI sources must
>> + * be processed on specific CPU
>> + */
>> + if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, 0, 2, SIGINT)
>> + == NOTIFY_STOP)
>> + return;
>>
>> -#ifdef CONFIG_X86_LOCAL_APIC
>
> Are you sure we may drop this option?
Yes. DIE_NMI is used for non-CPU-specific NMI sources now.
>> - if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
>> - == NOTIFY_STOP)
>> - return;
>> + /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
>> + if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
>> + return;
>
> As said, IPI and non-IPI are mixed up.
They are processed one after the other.
>>
>> -#ifndef CONFIG_LOCKUP_DETECTOR
>> - /*
>> - * Ok, so this is none of the documented NMI sources,
>> - * so it must be the NMI watchdog.
>> - */
>> - if (nmi_watchdog_tick(regs, reason))
>> - return;
>> - if (!do_nmi_callback(regs, cpu))
>> -#endif /* !CONFIG_LOCKUP_DETECTOR */
>> - unknown_nmi_error(reason, regs);
>> -#else
>> - unknown_nmi_error(reason, regs);
>> + if (!cpu) {
>> + reason = get_nmi_reason();
>> + if (reason & NMI_REASON_MASK) {
>> + if (reason & NMI_REASON_SERR)
>> + pci_serr_error(reason, regs);
>> + else if (reason & NMI_REASON_IOCHK)
>> + io_check_error(reason, regs);
>> +#ifdef CONFIG_X86_32
>> + /*
>> + * Reassert NMI in case it became active
>> + * meanwhile as it's edge-triggered:
>> + */
>> + reassert_nmi();
>> #endif
>> + return;
>> + }
>> + }
>>
>> +#if defined(CONFIG_X86_LOCAL_APIC) && !defined(CONFIG_LOCKUP_DETECTOR)
>> + if (nmi_watchdog_tick(regs, reason))
>> return;
>> - }
>> - if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP)
>> + if (do_nmi_callback(regs, smp_processor_id()))
>> return;
>> -
>> - /* AK: following checks seem to be broken on modern chipsets. FIXME */
>> - if (reason & NMI_REASON_SERR)
>> - pci_serr_error(reason, regs);
>> - if (reason & NMI_REASON_IOCHK)
>> - io_check_error(reason, regs);
>> -#ifdef CONFIG_X86_32
>> - /*
>> - * Reassert NMI in case it became active meanwhile
>> - * as it's edge-triggered:
>> - */
>> - reassert_nmi();
>> #endif
>> +
>> + if (notify_die(DIE_NMIUNKNOWN, "nmi_unknown", regs, reason, 2, SIGINT)
>> + == NOTIFY_STOP)
>> + return;
>> +
>> + unknown_nmi_error(reason, regs);
>> }
>>
>> dotraplinkage notrace __kprobes void
>> --- a/arch/x86/oprofile/nmi_int.c
>> +++ b/arch/x86/oprofile/nmi_int.c
>> @@ -64,7 +64,6 @@ static int profile_exceptions_notify(str
>> int ret = NOTIFY_DONE;
>>
>> switch (val) {
>> - case DIE_NMI:
>> case DIE_NMI_IPI:
>> if (ctr_running)
>> model->check_ctrs(args->regs, &__get_cpu_var(cpu_msrs));
>> --- a/arch/x86/oprofile/nmi_timer_int.c
>> +++ b/arch/x86/oprofile/nmi_timer_int.c
>> @@ -25,7 +25,7 @@ static int profile_timer_exceptions_noti
>> int ret = NOTIFY_DONE;
>>
>> switch (val) {
>> - case DIE_NMI:
>> + case DIE_NMI_IPI:
>> oprofile_add_sample(args->regs, 0);
>> ret = NOTIFY_STOP;
>> break;
>> --- a/drivers/char/ipmi/ipmi_watchdog.c
>> +++ b/drivers/char/ipmi/ipmi_watchdog.c
>> @@ -1080,7 +1080,7 @@ ipmi_nmi(struct notifier_block *self, un
>> {
>> struct die_args *args = data;
>>
>> - if (val != DIE_NMI)
>> + if (val != DIE_NMIUNKNOWN)
All watchdogs use DIE_NMIUNKNOWN in this patch. Because they should be
processed after CPU specific and non-CPU-specific NMIs. Or we define a
special DIE_NMI_XX for it? like DIE_NMI_WATCHDOG?
Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists