[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8dee042080bf3a00d376afc271b050eb@nuclearcat.com>
Date: Sat, 18 Feb 2017 10:09:39 +0200
From: Denys Fedoryshchenko <nuclearcat@...learcat.com>
To: Jon Masters <jcm@...masters.org>
Cc: linux-kernel@...r.kernel.org, kexec@...ts.infradead.org
Subject: Re: kexec on panic
On 2017-02-18 09:42, Jon Masters wrote:
> Hi Denys,
>
> On 02/10/2017 03:14 AM, Denys Fedoryshchenko wrote:
>
>> After years of using kexec and recent unpleasant experience with
>> modern (supposed to be blazing fast to boot) hardware that need 5-10
>> minutes just to pass POST tests,
>> one question came up to me:
>> Is it possible anyhow to execute regular (not special "panic" one to
>> capture crash data) kexec on panic to reduce reboot time?
>
> Generally, you don't want to do this, because various platform hardware
> might be in non-quiescent states (still doing DMA to random memory,
> etc.)
> and other nastiness that means you don't want to do more than the
> minimal
> amount in a kexec on panic (crash). We've seen no end of fun and games
> even with just regular crash dumps while hardware is busily writing to
> memory that it shouldn't be. An IOMMU helps, but isn't a cure-all.
>
> Jon.
Well, i have to try, even sometimes i am facing issues with non-booting
hardware even on regular kexec, but having at small customer HP server
that need almost 6 minutes to boot,
no hot-spare(and hard to do by many reasons, no spare 10G ports, cost of
hardware and etc) and some nasty bugs that is not resolved yet - forcing
me to search way to reduce reboot time.
If i will find way to save backtrace and reboot fast, it will help a lot
to debug kernels with minimal downtime, if bug is reproducible only on
live system.
What i did now, might be insanely wrong, but:
diff -Naur linux-4.9.9-vanilla/kernel/kexec_core.c
linux-4.9.9/kernel/kexec_core.c
--- linux-4.9.9-vanilla/kernel/kexec_core.c 2017-02-09
07:08:40.000000000 +0000
+++ linux-4.9.9/kernel/kexec_core.c 2017-02-17 12:54:49.000000000 +0000
@@ -897,6 +897,10 @@
machine_crash_shutdown(&fixed_regs);
machine_kexec(kexec_crash_image);
}
+ if (kexec_image) {
+ machine_shutdown();
+ machine_kexec(kexec_image);
+ }
mutex_unlock(&kexec_mutex);
}
}
Then
kexec -l /mnt/flash/kernel --append="intel_idle.max_cstate=0
processor.max_cstate=1"
and
echo c >/proc/sysrq-trigger
worked even on busy network router, but i'm not sure it will be same on
real networking stack crash.
Powered by blists - more mailing lists