lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923130457.901085554@kernel.org>
Date: Tue, 23 Sep 2025 09:04:57 -0400
From: Steven Rostedt <rostedt@...nel.org>
To: linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org
Cc: Masami Hiramatsu <mhiramat@...nel.org>,
 Mark Rutland <mark.rutland@....com>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Peter Zijlstra <peterz@...radead.org>,
 Namhyung Kim <namhyung@...nel.org>,
 Takaya Saeki <takayas@...gle.com>,
 Tom Zanussi <zanussi@...nel.org>,
 Thomas Gleixner <tglx@...utronix.de>,
 Ian Rogers <irogers@...gle.com>,
 Douglas Raillard <douglas.raillard@....com>
Subject: [PATCH v2 0/8] tracing: Show contents of syscall trace event user space fields


As of commit 654ced4a1377 ("tracing: Introduce tracepoint_is_faultable()")
system call trace events allow faulting in user space memory. Have some of
the system call trace events take advantage of this.

Introduce a way to read from user space addresses from the syscall trace
event. The way this is accomplished is by creating a per CPU temporary
buffer that is used to read unsafe user memory.

When a syscall trace event needs to read user memory, it reads the per CPU
sched switch counter. It then disables migration, enables preemption,
copies the user space memory into this buffer, then disables preemption again.
If the counter is the same as the original value the buffer is valid.
Otherwise it needs to try again. This is similar to how seqcount works, but
uses the per CPU sched switch counter as its sequence counter. If the counter
is not the same, it means another task scheduled in, and that task could have
used the same buffer and overwritten the data.

A new file is created in the tracefs directory (and also per instance) that
allows the user to shorten the amount copied from user space. It can be
completely disabled if set to zero (it will only display "" or (, ...)
but no copying from user space will be performed). The max size to copy is
hard coded to 128, which should be enough for this purpose.

This allows the output to look like this:

 sys_access(filename: 0x7f8c55368470 "/etc/ld.so.preload", mode: 4)
 sys_execve(filename: 0x564ebcf5a6b8 "/usr/bin/emacs", argv: 0x7fff357c0300, envp: 0x564ebc4a4820)
 sys_write(fd: 1, buf: 0x56430f353be0 (2f:72:6f:6f:74:0a) "/root.", count: 6)
 sys_sethostname(name: 0x5584310eb2a0 "debian", len: 6)
 sys_renameat2(olddfd: 0xffffff9c, oldname: 0x7ffe02facdff "/tmp/x", newdfd: 0xffffff9c, newname: 0x7ffe02face06 "/tmp/y", flags: 1)


Changes since v1: https://lore.kernel.org/linux-trace-kernel/20250805192646.328291790@kernel.org/

- Removed __rcu annotation to the fields that do not need RCU to protect
  them.

- Hide newsfstat around
  #if defined(__ARCH_WANT_NEW_STAT) || defined(__ARCH_WANT_STAT64)
  as parisc failed to build without it. (kernel test robot)

- Fixed allocation of sinfo which used sizeof(sinfo) and not
  sizeof(*sinfo) (kernel test robot)

- Instead of incrementing a counter via the sched_switch tracepoint, use
  the nr_context_switches() API. (Mathieu Desnoyers).

- Use the length saved in the meta data of the event to limit the size of
  the string printed "%.*s", len, str.

- Add comment describing that the method to read the memory from user
  space is similar to how seqcount works.

- Hide kexec_file_load around
  #if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32
  to not break the i386 build.

- Added __user annotation to variable copying from user (kernel test robot)

- Change default to 63 (127 seemed too much)

- Change the max to 165 to fill in the extra data.

- Use the size macros of the max size and max args to calculate the size
  of the buffer to save the values in.

- Added new patch to show printable characters of binary arrays that are
  displayed.    


Steven Rostedt (8):
      tracing: Replace syscall RCU pointer assignment with READ/WRITE_ONCE()
      tracing: Have syscall trace events show "0x" for values greater than 10
      tracing: Have syscall trace events read user space string
      tracing: Have system call events record user array data
      tracing: Display some syscall arrays as strings
      tracing: Allow syscall trace events to read more than one user parameter
      tracing: Add syscall_user_buf_size to limit amount written
      tracing: Show printable characters in syscall arrays

----
 Documentation/trace/ftrace.rst |   8 +
 include/trace/syscall.h        |   8 +-
 kernel/trace/Kconfig           |  13 +
 kernel/trace/trace.c           |  52 +++
 kernel/trace/trace.h           |   7 +-
 kernel/trace/trace_syscalls.c  | 700 +++++++++++++++++++++++++++++++++++++++--
 6 files changed, 756 insertions(+), 32 deletions(-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ