linux-kernel - [for-linus][PATCH 0/5] tracing: Clean up and fixes or v6.18

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251009152359.604267051@kernel.org>
Date: Thu, 09 Oct 2025 11:23:59 -0400
From: Steven Rostedt <rostedt@...nel.org>
To: linux-kernel@...r.kernel.org
Cc: Masami Hiramatsu <mhiramat@...nel.org>,
 Mark Rutland <mark.rutland@....com>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Andrew Morton <akpm@...ux-foundation.org>
Subject: [for-linus][PATCH 0/5] tracing: Clean up and fixes or v6.18


tracing clean up and fixes for v6.18:

- Have osnoise tracer use memdup_user_nul()

  The function osnoise_cpus_write() open codes a kmalloc() and then
  a copy_from_user() and then adds a nul byte at the end which is the
  same as simply using memdup_user_nul().

- Fix wakeup and irq tracers when failing to acquire calltime

  When the wakeup and irq tracers use the function graph tracer for
  tracing function times, it saves a timestamp into the fgraph shadow
  stack. It is possible that this could fail to be stored. If that
  happens, it exits the routine early. These functions also disable
  nesting of the operations by incremeting the data "disable" counter.
  But if the calltime exits out early, it never increments the counter
  back to what it needs to be.

  Since there's only a couple of lines of code that does work after
  acquiring the calltime, instead of exiting out early, reverse the
  if statement to be true if calltime is acquired, and place the code
  that is to be done within that if block. The clean up will always
  be done after that.

- Fix ring_buffer_map() return value on failure of __rb_map_vma()

  If __rb_map_vma() fails in ring_buffer_map(), it does not return
  an error. This means the caller will be working against a bad vma
  mapping. Have ring_buffer_map() return an error when __rb_map_vma()
  fails.

- Fix regression of writing to the trace_marker file

  A bug fix was made to change __copy_from_user_inatomic() to
  copy_from_user_nofault() in the trace_marker write function.
  The trace_marker file is used by applications to write into
  it (usually with a file descriptor opened at the start of the
  program) to record into the tracing system. It's usually used
  in critical sections so the write to trace_marker is highly
  optimized.

  The reason for copying in an atomic section is that the write
  reserves space on the ring buffer and then writes directly into
  it. After it writes, it commits the event. The time between
  reserve and commit must have preemption disabled.

  The trace marker write does not have any locking nor can it
  allocate due to the nature of it being a critical path.

  Unfortunately, converting __copy_from_user_inatomic() to
  copy_from_user_nofault() caused a regression in Android.
  Now all the writes from its applications trigger the fault that
  is rejected by the _nofault() version that wasn't rejected by
  the _inatomic() version. Instead of getting data, it now just
  gets a trace buffer filled with:

    tracing_mark_write: <faulted>

  To fix this, on opening of the trace_marker file, allocate
  per CPU buffers that can be used by the write call. Then
  when entering the write call, do the following:

    preempt_disable();
    cpu = smp_processor_id();
    do {
	cnt = nr_context_switches_cpu(cpu);
	migrate_disable();
	preempt_enable();
	ret = copy_from_user(buffer, ptr, size);
	preempt_disable();
	migrate_enable();
    } while (!ret && cnt != nr_context_switches_cpu(cpu));
    if (!ret)
	ring_buffer_write(buffer);
    preempt_enable();

  This works similarly to seqcount. As it must enabled preemption
  to do a copy_from_user() into a per CPU buffer, if it gets
  preempted, the buffer could be corrupted by another task.
  To handle this, read the number of context switches of the current
  CPU, disable migration, enable preemption, copy the data from
  user space, then immediately disable preemption again.
  If the number of context switches is the same, the buffer
  is still valid. Otherwise it must be assumed that the buffer may
  have been corrupted and it needs to try again.

  Now the trace_marker write can get the user data even if it has
  to fault it in, and still not grab any locks of its own.

  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace/fixes

Head SHA1: 64cf7d058a005c5c31eb8a0b741f35dc12915d18


Ankit Khushwaha (1):
      ring buffer: Propagate __rb_map_vma return value to caller

Steven Rostedt (3):
      tracing: Fix wakeup tracers on failure of acquiring calltime
      tracing: Fix irqoff tracers on failure of acquiring calltime
      tracing: Have trace_marker use per-cpu data to read user space

Thorsten Blum (1):
      tracing/osnoise: Replace kmalloc + copy_from_user with memdup_user_nul

----
 kernel/trace/ring_buffer.c        |   2 +-
 kernel/trace/trace.c              | 268 +++++++++++++++++++++++++++++++-------
 kernel/trace/trace_irqsoff.c      |  23 ++--
 kernel/trace/trace_osnoise.c      |  11 +-
 kernel/trace/trace_sched_wakeup.c |  16 +--
 5 files changed, 241 insertions(+), 79 deletions(-)