lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20170506230315.GA5553@gmail.com>
Date:   Sun, 7 May 2017 01:05:28 +0200
From:   Adrien Mahieux <adrien.mahieux@...il.com>
To:     linux-kernel@...r.kernel.org
Subject: [RFC] NMI: Generic per-code NMI handler (panic/kdump)

Hello,


I'm new to the LKML, so should I make mistakes, please tell me along with
the correct way to do (or doc I've read but forgotten).

I've written a small module to manage NMI events based on their code, so a
sysadmin can drop them (avoid console messages) or panic the kernel (kdump).
https://github.com/Saruspete/nmimgr/blob/master/nmimgr.c

So far, working as expected in massive prod, with different kernels. 


As a newbie, I've got some questions I didn't found response in the docs:

- My code is supporting multiple versions with KERNEL_VERSION macro, but I 
  read it's not recommanded and should just be compiling against masters head.
  May I leave this as is to ease the distributions' maintainers work ?

- In what subsystem/file should it go ?
  arch/x86/kernel/nmi.c (but should be for all archs)
  kernel/watchdog.c     (but not a watchdog)
  drivers/char/ipmi     (but not an IPMI nor a driver)

- How to know where to place its Kconfig menus ? It's easy for drivers, but
  what about this one ?

- If someone has time to review the code and point me cases I didn't think
  of, would be happy to fix them.



Here are some real-life usage of this module:

- When my servers are frozen, I generate an NMI from IPMI "power diag". But the
  event code changes between each hardware vendor (even different gen of the
  same vendor) and I have some specific hardware (like fpgas) that generates
  NMI as well, or near-dead parts that generates some too so I can't use
  *nmi_panic sysctls.

- When using hpwdt module, it registers an equivalent of panic upon any nmi
  event. So I still want the watchdog, but only upon ILO and ASR NMIs, not
  all others.

- During a kdump, some servers may take a lot of time to dump memory. If the
  server receives another NMI, it'll reboot and loose the current dump. By
  dropping all NMIs, it acts as a fence during the kdump. 

To help the usage, I've added a "setup.sh" in the repo to build and configure
the kmod with the NMI events matching the current hardware (HP, Dell, IBM,
VirtualBox...).



Thanks for your guidance.

Adrien.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ