linux-kernel - Re: [PATCH v2] kexec: Fix kdump failure with notsc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1468395865.2020.10.camel@localhost>
Date:	Wed, 13 Jul 2016 07:46:48 +0000
From:	"Wei, Jiangang" <weijg.fnst@...fujitsu.com>
To:	"mingo@...nel.org" <mingo@...nel.org>
CC:	"kexec@...ts.infradead.org" <kexec@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"fenghua.yu@...el.com" <fenghua.yu@...el.com>,
	"ebiederm@...ssion.com" <ebiederm@...ssion.com>,
	"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
	"mingo@...hat.com" <mingo@...hat.com>
Subject: Re: [PATCH v2] kexec: Fix kdump failure with notsc

On Mon, 2016-07-11 at 18:28 +0800, Wei Jiangang wrote:
> Hi , Ingo
> 
> On Fri, 2016-07-08 at 09:38 +0200, Ingo Molnar wrote:
> > * Eric W. Biederman <ebiederm@...ssion.com> wrote:
> > 
> > > Sigh.  Can we please just do the work to rip out the apic shutdown code from the 
> > > kexec on panic code path?
> > > 
> > > I forgetting details but the only reason we have do any apic shutdown is bugs in 
> > > older kernels that could not initialize a system properly if we did not shut 
> > > down the apics.
> > > 
> > > I certainly don't see an issue with goofy cases like notsc not working on a 
> > > crash capture kernel if we are not initializing the hardware properly.
> > > 
> > > The strategy really needs to be to only do the absolutely essential hardware 
> > > shutdown in the crashing kernel, every adintional line of code we execute in the 
> > > crashing kernel increases our chances of hitting a bug.
> > 
> > Fully agreed.
> > 
> > > Under that policy things like requring we don't pass boot options that inhibit 
> > > the dump catpure kernel from initializing the hardware from a random state are 
> > > reasonable requirements.  AKA I don't see any justification in this as to why we 
> > > would even want to support notsc on the dump capture kernel.  Especially when 
> > > things clearly work when that option is not specified.
> > 
> > So at least on the surface it appears 'surprising' that the 'notsc' option (which, 
> > supposedly, disables TSC handling) interferes with being able to fully boot. Even 
> > if 'notsc' is specified we are still using the local APIC, right?
> 
> In most case,  It's no problem that using local APIC while notsc is
> specified.
> But not for kdump.
> 
> We can get evidence, Especially from "Spurious LAPIC timer interrupt on
> cpu 0".
> 
> ###serial log,
> 
> [    0.000000] NR_IRQS:524544 nr_irqs:256 16
> [    0.000000] Spurious LAPIC timer interrupt on cpu 0
> [    0.000000] Console: colour dummy device 80x25
> [    0.000000] console [tty0] enabled
> [    0.000000] console [ttyS0] enabled
> [    0.000000] tsc: Fast TSC calibration using PIT
> [    0.000000] tsc: Detected 2099.947 MHz processor
> [    0.000000] Calibrating delay loop...
> 
> 
> Due to the local apic and local apic timer hasn't been setup and enabled
> fully, The event_handler of clock event is NULL.
> 
> ###codes,
> 
> static void local_apic_timer_interrupt(void)
> {
>     int cpu = smp_processor_id();
>     struct clock_event_device *evt = &per_cpu(lapic_events, cpu);
> 
>     /*
>      * Normally we should not be here till LAPIC has been initialized
> but
>      * in some cases like kdump, its possible that there is a pending
> LAPIC
>      * timer interrupt from previous kernel's context and is delivered
> in
>      * new kernel the moment interrupts are enabled.
>      *
>      * Interrupts are enabled early and LAPIC is setup much later, hence
>      * its possible that when we get here evt->event_handler is NULL.
>      * Check for event_handler being NULL and discard the interrupt as
>      * spurious.
>      */
>     if (!evt->event_handler) {
>         pr_warning("Spurious LAPIC timer interrupt on cpu %d\n", cpu);
>         /* Switch it off */
>         lapic_timer_shutdown(evt);
>         return;
>     }
> 
>      .............
> }
> 
> 
> IMHO, 
> If we specify notsc, the dump-capture kernel waits for jiffies being
> updated early and LAPIC and timer are setup much later, which causes no
> timer interrupts is passed to BSP. as following,
> 
> start_kernel  -->
> 1)-> calibrate_delay()  -> calibrate_delay_converge()  # hang and wait
> for jiffies changed
>                    
> 2)-> rest_init() -> kernel_init() -> .... -> apic_bsp_setup() ->
> setup_local_APIC()
> 
> -> setup_percpu_clockev().
> 
> the setup_percpu_clockev points setup_boot_APIC_clock() which used to
> setup the boot APIC and timer.
> 
> 
> > So it might be a good idea to find the root cause of this bootup fragility even if 
> > 'notsc' is specified. And I fully agree that it should be fixed in the bootup path 
> > of the dump kernel, not the crash kernel reboot path.
> 

Can anyone give some advice or commet on the following idea？
Thanks in advance.

> Because the lapic and timer are not ready when dump-capture waits them
> to update the jiffies value. so I suggest to put APIC in legacy mode in
> local_apic_timer_interrupt() temporarily, which in the bootup path of
> dump kernel. 
> 
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index dcb52850a28f..af3be93997ed 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -879,6 +879,7 @@ static void local_apic_timer_interrupt(void)
>                 pr_warning("Spurious LAPIC timer interrupt on cpu %d\n",
> cpu);
>                 /* Switch it off */
>                 lapic_timer_shutdown(evt);
> +             disable_IO_APIC();
>                 return;
>         }
> 
> And the new solution can fix the problem.
> What‘s your opinion about it?
> 
> Thanks,
> wei
> 
> > 
> > Thanks,
> > 
> > 	Ingo
> > 
> > 
>