[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZkErkG77vLkcVaUZ@gallifrey>
Date: Sun, 12 May 2024 20:50:24 +0000
From: "Dr. David Alan Gilbert" <linux@...blig.org>
To: Muni Sekhar <munisekharrms@...il.com>
Cc: kernelnewbies <kernelnewbies@...nelnewbies.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: Inquiry Regarding Handling of Kernel Crashes
* Muni Sekhar (munisekharrms@...il.com) wrote:
> Dear Linux Kernel Community,
Hi,
> I hope this email finds you well. I am currently engaged in testing
> device drivers in Linux kernel mode, and I have encountered various
> types of kernel crashes during my testing process.
>
> Among these, some examples of kernel crashes include OOPS, lockups and others.
>
> I have a few questions regarding the handling of kernel crashes during testing:
>
> When encountering a kernel crash during testing, is it advisable to
> continue testing without rebooting the system? Or is it preferable to
> reboot the system after each kernel crash and then resume testing?
Rebooting is best.
> Can the first kernel crash, whether it is an OOPS, or any other type
> crash, potentially lead to subsequent crashes of the same or different
> types? If so, should debugging efforts focus only on the first kernel
> crash, or should all subsequent crashes also be considered and
> addressed?
Yes - not all failures do that, but some will cause follow on crashes;
looking at the first crash normally gives the most reliable idea
of what went wrong. But keep all the logs, anything might help you figure
it out.
> In the event that the system needs to be rebooted after a kernel
> crash, how can user space test utilities be informed that a kernel
> crash has occurred? Additionally, how can the system be configured to
> automatically reboot in the event of a kernel crash?
See Documentation/admin-guide/kernel-parameters.txt there are
quite a few useful ones, in particular:
oops=panic will cause a panic after an oops
which when you combine it with
panic=30
means an oops will then cause a panic which causes a reboot.
You could also consider using a 'crash kernel' - on a panic
that lands in a fresh kernel that just saves a memory snapshot
that you can then try and debug.
Turning on a watchdog as well is good; some kernel bugs just hang
rather than giving a nice oops.
> I would greatly appreciate any insights or best practices you can
> share regarding the handling of kernel crashes during testing. Your
> expertise and guidance on this matter would be invaluable to my
> testing efforts.
>
> Thank you very much for your time and assistance. I look forward to
> your response.
Good luck!
Dave
>
>
> --
> Thanks,
> Sekhar
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
Powered by blists - more mailing lists