lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <cover.1519911559.git.rahul.lakkireddy@chelsio.com>
Date:   Fri,  2 Mar 2018 17:49:56 +0530
From:   Rahul Lakkireddy <rahul.lakkireddy@...lsio.com>
To:     linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        kexec@...ts.infradead.org
Cc:     davem@...emloft.net, ebiederm@...ssion.com,
        akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
        ganeshgr@...lsio.com, nirranjan@...lsio.com, indranil@...lsio.com,
        Rahul Lakkireddy <rahul.lakkireddy@...lsio.com>
Subject: [RFC 0/2] kernel: add support to collect hardware logs in panic

On production servers running variety of workloads over time, kernel
panic can happen sporadically after days or even months. It is
important to collect as much debug logs as possible to root cause
and fix the problem, that may not be easy to reproduce. Snapshot of
underlying hardware/firmware state (like register dump, firmware
logs, adapter memory, etc.), at the time of kernel panic will be very
helpful while debugging the culprit device driver.

This series of patches add new generic framework that enable device
drivers to collect device specific snapshot of the hardware/firmware
state of the underlying device at the time of kernel panic. The
collected logs are appended to vmcore along with details, such as
start address and length of the logs, which are required for
extraction during post-analysis.

Device drivers can use crash_driver_dump_register() to register their
callback that collects underlying device specific hardware/firmware
logs during kernel panic (i.e. before booting into the second kernel).
Drivers can unregister with crash_driver_dump_unregister().

To extract the device specific hardware/firmware logs using crash:

crash> help -D | grep DRIVERDUMP
DRIVERDUMP=(cxgb4_0000:02:00.4, ffffb131090bd000, 37782968)

crash> rd ffffb131090bd000 37782968 -r hardware.log
37782968 bytes copied from 0xffffb131090bd000 to hardware.log

Patch 1 adds API to allow drivers to register callback to
collect the device specific hardware/firmware logs.

Patch 2 shows a cxgb4 driver example using the API to collect
hardware/firmware logs during kernel panic.

Suggestions and feedback will be much appreciated.

Thanks,
Rahul

Rahul Lakkireddy (2):
  kernel/crash_core: add API to collect hardware dump in kernel panic
  cxgb4: collect hardware dump in kernel panic

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h       |  6 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 95 +++++++++++++++++++++++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h |  4 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  | 12 +++
 include/linux/crash_core.h                       | 33 ++++++++
 kernel/crash_core.c                              | 83 ++++++++++++++++++++-
 kernel/kexec_core.c                              |  1 +
 7 files changed, 229 insertions(+), 5 deletions(-)

-- 
2.14.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ