[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZkErkG77vLkcVaUZ@gallifrey>
Date: Sun, 12 May 2024 20:50:24 +0000
From: "Dr. David Alan Gilbert" <linux@...blig.org>
To: Muni Sekhar <munisekharrms@...il.com>
Cc: kernelnewbies <kernelnewbies@...nelnewbies.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: Inquiry Regarding Handling of Kernel Crashes
* Muni Sekhar (munisekharrms@...il.com) wrote:
> Dear Linux Kernel Community,
Hi,
> I hope this email finds you well. I am currently engaged in testing
> device drivers in Linux kernel mode, and I have encountered various
> types of kernel crashes during my testing process.
> 
> Among these, some examples of kernel crashes include OOPS, lockups and others.
> 
> I have a few questions regarding the handling of kernel crashes during testing:
> 
> When encountering a kernel crash during testing, is it advisable to
> continue testing without rebooting the system? Or is it preferable to
> reboot the system after each kernel crash and then resume testing?
Rebooting is best.
> Can the first kernel crash, whether it is an OOPS,  or any other type
> crash, potentially lead to subsequent crashes of the same or different
> types? If so, should debugging efforts focus only on the first kernel
> crash, or should all subsequent crashes also be considered and
> addressed?
Yes - not all failures do that, but some will cause follow on crashes;
looking at the first crash normally gives the most reliable idea
of what went wrong. But keep all the logs, anything might help you figure
it out.
> In the event that the system needs to be rebooted after a kernel
> crash, how can user space test utilities be informed that a kernel
> crash has occurred? Additionally, how can the system be configured to
> automatically reboot in the event of a kernel crash?
See Documentation/admin-guide/kernel-parameters.txt  there are 
quite a few useful ones, in particular:
     oops=panic   will cause a panic after an oops
which when you combine it with
     panic=30
   means an oops will then cause a panic which causes a reboot.
You could also consider using a 'crash kernel' - on a panic
that lands in a fresh kernel that just saves a memory snapshot
that you can then try and debug.
Turning on a watchdog as well is good; some kernel bugs just hang
rather than giving a nice oops.
> I would greatly appreciate any insights or best practices you can
> share regarding the handling of kernel crashes during testing. Your
> expertise and guidance on this matter would be invaluable to my
> testing efforts.
> 
> Thank you very much for your time and assistance. I look forward to
> your response.
Good luck!
Dave
> 
> 
> -- 
> Thanks,
> Sekhar
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/
Powered by blists - more mailing lists
 
