[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250910052335.1151048-1-wangjinchao600@gmail.com>
Date: Wed, 10 Sep 2025 13:23:09 +0800
From: Jinchao Wang <wangjinchao600@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Mike Rapoport <rppt@...nel.org>,
"Naveen N . Rao" <naveen@...nel.org>,
Andrey Ryabinin <ryabinin.a.a@...il.com>,
Alexander Potapenko <glider@...gle.com>,
Andrey Konovalov <andreyknvl@...il.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Vincenzo Frascino <vincenzo.frascino@....com>,
kasan-dev@...glegroups.com,
"David S. Miller" <davem@...emloft.net>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
"Liang, Kan" <kan.liang@...ux.intel.com>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
linux-mm@...ck.org,
linux-trace-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org
Cc: linux-kernel@...r.kernel.org,
Jinchao Wang <wangjinchao600@...il.com>
Subject: [PATCH v3 00/19] mm/ksw: Introduce real-time Kernel Stack Watch debugging tool
This patch series introduces **KStackWatch**, a lightweight kernel debugging tool
for detecting kernel stack corruption in real time.
The motivation comes from scenarios where corruption occurs silently in one function
but manifests later as a crash in another. Using KASAN may not reproduce the issue due
to its heavy overhead. with no direct call trace linking the two. Such bugs are often
extremely hard to debug with existing tools.
I demonstrate this scenario in **test2 (silent corruption test)**.
KStackWatch works by combining a hardware breakpoint with kprobe and fprobe.
It can watch a stack canary or a selected local variable and detects the moment the
corruption actually occurs. This allows developers to pinpoint the real source rather
than only observing the final crash.
Key features include:
- Lightweight overhead with minimal impact on bug reproducibility
- Real-time detection of stack corruption
- Simple configuration through `/proc/kstackwatch`
- Support for recursive depth filter
To validate the approach, the patch includes a test module and a test script.
---
Changelog
V3:
Main changes:
* Use modify_wide_hw_breakpoint_local() (from Masami)
* Add atomic flag to restrict /proc/kstackwatch to a single opener
* Protect stack probe with an atomic PID flag
* Handle CPU hotplug for watchpoints
* Add preempt_disable/enable in ksw_watch_on_local_cpu()
* Introduce const struct ksw_config *ksw_get_config(void) and use it
* Switch to global watch_attr, remove struct watch_info
* Validate local_var_len in parser()
* Handle case when canary is not found
* Use dump_stack() instead of show_regs() to allow module build
Cleanups:
* Reduce logging and comments
* Format logs with KBUILD_MODNAME
* Remove unused headers
Documentation:
* Add new document
V2:
https://lore.kernel.org/all/20250904002126.1514566-1-wangjinchao600@gmail.com/
* Make hardware breakpoint and stack operations architecture-independent.
V1:
https://lore.kernel.org/all/20250828073311.1116593-1-wangjinchao600@gmail.com/
Core Implementation
* Replaced kretprobe with fprobe for function exit hooking, as suggested
by Masami Hiramatsu
* Introduced per-task depth logic to track recursion across scheduling
* Removed the use of workqueue for a more efficient corruption check
* Reordered patches for better logical flow
* Simplified and improved commit messages throughout the series
* Removed initial archcheck which should be improved later
Testing and Architecture
* Replaced the multiple-thread test with silent corruption test
* Split self-tests into a separate patch to improve clarity.
Maintenance
* Added a new entry for KStackWatch to the MAINTAINERS file.
RFC:
https://lore.kernel.org/lkml/20250818122720.434981-1-wangjinchao600@gmail.com/
---
The series is structured as follows:
Jinchao Wang (18):
x86/hw_breakpoint: introduce arch_reinstall_hw_breakpoint() for atomic
context
mm/ksw: add build system support
mm/ksw: add ksw_config struct and parser
mm/ksw: add /proc/kstackwatch interface
mm/ksw: add HWBP pre-allocation
mm/ksw: add atomic watch on/off operations
mm/ksw: support CPU hotplug
mm/ksw: add probe management helpers
mm/ksw: resolve stack watch addr and len
mm/ksw: add recursive depth tracking
mm/ksw: manage start/stop of stack watching
mm/ksw: add self-debug helpers
mm/ksw: add test module
mm/ksw: add stack overflow test
mm/ksw: add silent corruption test case
mm/ksw: add recursive stack corruption test
tools/ksw: add test script
docs: add KStackWatch document
Masami Hiramatsu (Google) (1):
HWBP: Add modify_wide_hw_breakpoint_local() API
Documentation/dev-tools/kstackwatch.rst | 94 ++++++++
MAINTAINERS | 7 +
arch/Kconfig | 10 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/hw_breakpoint.h | 1 +
arch/x86/kernel/hw_breakpoint.c | 50 +++++
include/linux/hw_breakpoint.h | 6 +
kernel/events/hw_breakpoint.c | 36 ++++
mm/Kconfig.debug | 21 ++
mm/Makefile | 1 +
mm/kstackwatch/Makefile | 8 +
mm/kstackwatch/kernel.c | 239 ++++++++++++++++++++
mm/kstackwatch/kstackwatch.h | 53 +++++
mm/kstackwatch/stack.c | 276 ++++++++++++++++++++++++
mm/kstackwatch/test.c | 259 ++++++++++++++++++++++
mm/kstackwatch/watch.c | 205 ++++++++++++++++++
tools/kstackwatch/kstackwatch_test.sh | 40 ++++
17 files changed, 1307 insertions(+)
create mode 100644 Documentation/dev-tools/kstackwatch.rst
create mode 100644 mm/kstackwatch/Makefile
create mode 100644 mm/kstackwatch/kernel.c
create mode 100644 mm/kstackwatch/kstackwatch.h
create mode 100644 mm/kstackwatch/stack.c
create mode 100644 mm/kstackwatch/test.c
create mode 100644 mm/kstackwatch/watch.c
create mode 100755 tools/kstackwatch/kstackwatch_test.sh
--
2.43.0
Powered by blists - more mailing lists