linux-kernel - Re: Inquiry Regarding Handling of Kernel Crashes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZkErkG77vLkcVaUZ@gallifrey>
Date: Sun, 12 May 2024 20:50:24 +0000
From: "Dr. David Alan Gilbert" <linux@...blig.org>
To: Muni Sekhar <munisekharrms@...il.com>
Cc: kernelnewbies <kernelnewbies@...nelnewbies.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: Inquiry Regarding Handling of Kernel Crashes

* Muni Sekhar (munisekharrms@...il.com) wrote:
> Dear Linux Kernel Community,

Hi,

> I hope this email finds you well. I am currently engaged in testing
> device drivers in Linux kernel mode, and I have encountered various
> types of kernel crashes during my testing process.
> 
> Among these, some examples of kernel crashes include OOPS, lockups and others.
> 
> I have a few questions regarding the handling of kernel crashes during testing:
> 
> When encountering a kernel crash during testing, is it advisable to
> continue testing without rebooting the system? Or is it preferable to
> reboot the system after each kernel crash and then resume testing?

Rebooting is best.

> Can the first kernel crash, whether it is an OOPS,  or any other type
> crash, potentially lead to subsequent crashes of the same or different
> types? If so, should debugging efforts focus only on the first kernel
> crash, or should all subsequent crashes also be considered and
> addressed?

Yes - not all failures do that, but some will cause follow on crashes;
looking at the first crash normally gives the most reliable idea
of what went wrong. But keep all the logs, anything might help you figure
it out.

> In the event that the system needs to be rebooted after a kernel
> crash, how can user space test utilities be informed that a kernel
> crash has occurred? Additionally, how can the system be configured to
> automatically reboot in the event of a kernel crash?

See Documentation/admin-guide/kernel-parameters.txt  there are 
quite a few useful ones, in particular:
     oops=panic   will cause a panic after an oops
which when you combine it with
     panic=30

   means an oops will then cause a panic which causes a reboot.

You could also consider using a 'crash kernel' - on a panic
that lands in a fresh kernel that just saves a memory snapshot
that you can then try and debug.

Turning on a watchdog as well is good; some kernel bugs just hang
rather than giving a nice oops.

> I would greatly appreciate any insights or best practices you can
> share regarding the handling of kernel crashes during testing. Your
> expertise and guidance on this matter would be invaluable to my
> testing efforts.
> 
> Thank you very much for your time and assistance. I look forward to
> your response.

Good luck!

Dave

> 
> 
> -- 
> Thanks,
> Sekhar
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/