[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20250216001413.5f2c05d95af12b1f1f01062e@kernel.org>
Date: Sun, 16 Feb 2025 00:14:13 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Linux Trace Kernel
<linux-trace-kernel@...r.kernel.org>, Masami Hiramatsu
<mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Vincent Donnefort <vdonnefort@...gle.com>
Subject: Re: [PATCH v2] tracing: Do not allow mmap() of persistent ring
buffer
On Fri, 14 Feb 2025 11:55:47 -0500
Steven Rostedt <rostedt@...dmis.org> wrote:
> From: Steven Rostedt <rostedt@...dmis.org>
>
> When trying to mmap a trace instance buffer that is attached to
> reserve_mem, it would crash:
>
> BUG: unable to handle page fault for address: ffffe97bd00025c8
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 2862f3067 P4D 2862f3067 PUD 0
> Oops: Oops: 0000 [#1] PREEMPT_RT SMP PTI
> CPU: 4 UID: 0 PID: 981 Comm: mmap-rb Not tainted 6.14.0-rc2-test-00003-g7f1a5e3fbf9e-dirty #233
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> RIP: 0010:validate_page_before_insert+0x5/0xb0
> Code: e2 01 89 d0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <48> 8b 46 08 a8 01 75 67 66 90 48 89 f0 8b 50 34 85 d2 74 76 48 89
> RSP: 0018:ffffb148c2f3f968 EFLAGS: 00010246
> RAX: ffff9fa5d3322000 RBX: ffff9fa5ccff9c08 RCX: 00000000b879ed29
> RDX: ffffe97bd00025c0 RSI: ffffe97bd00025c0 RDI: ffff9fa5ccff9c08
> RBP: ffffb148c2f3f9f0 R08: 0000000000000004 R09: 0000000000000004
> R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
> R13: 00007f16a18d5000 R14: ffff9fa5c48db6a8 R15: 0000000000000000
> FS: 00007f16a1b54740(0000) GS:ffff9fa73df00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffe97bd00025c8 CR3: 00000001048c6006 CR4: 0000000000172ef0
> Call Trace:
> <TASK>
> ? __die_body.cold+0x19/0x1f
> ? __die+0x2e/0x40
> ? page_fault_oops+0x157/0x2b0
> ? search_module_extables+0x53/0x80
> ? validate_page_before_insert+0x5/0xb0
> ? kernelmode_fixup_or_oops.isra.0+0x5f/0x70
> ? __bad_area_nosemaphore+0x16e/0x1b0
> ? bad_area_nosemaphore+0x16/0x20
> ? do_kern_addr_fault+0x77/0x90
> ? exc_page_fault+0x22b/0x230
> ? asm_exc_page_fault+0x2b/0x30
> ? validate_page_before_insert+0x5/0xb0
> ? vm_insert_pages+0x151/0x400
> __rb_map_vma+0x21f/0x3f0
> ring_buffer_map+0x21b/0x2f0
> tracing_buffers_mmap+0x70/0xd0
> __mmap_region+0x6f0/0xbd0
> mmap_region+0x7f/0x130
> do_mmap+0x475/0x610
> vm_mmap_pgoff+0xf2/0x1d0
> ksys_mmap_pgoff+0x166/0x200
> __x64_sys_mmap+0x37/0x50
> x64_sys_call+0x1670/0x1d70
> do_syscall_64+0xbb/0x1d0
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> The reason was that the code that maps the ring buffer pages to user space
> has:
>
> page = virt_to_page((void *)cpu_buffer->subbuf_ids[s]);
>
> And uses that in:
>
> vm_insert_pages(vma, vma->vm_start, pages, &nr_pages);
>
> But virt_to_page() does not work with vmap()'d memory which is what the
> persistent ring buffer has. It is rather trivial to allow this, but for
> now just disable mmap() of instances that have their ring buffer from the
> reserve_mem option.
>
> If an mmap() is performed on a persistent buffer it will return -ENODEV
> just like it would if the .mmap field wasn't defined in the
> file_operations structure.
>
Looks good to me.
Tested-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>
Thanks,
> Cc: stable@...r.kernel.org
> Fixes: e645535a954ad ("tracing: Add option to use memmapped memory for trace boot instance")
> Signed-off-by: Steven Rostedt (Google) <rostedt@...dmis.org>
> ---
> Changes since v1: https://lore.kernel.org/20250213180737.061871ae@gandalf.local.home
>
> - Return -ENODEV instead of -EINVAL as that is what is returned when the .mmap
> field is missing from the file_operations structure.
>
> kernel/trace/trace.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 25ff37aab00f..0e6d517e74e0 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -8279,6 +8279,10 @@ static int tracing_buffers_mmap(struct file *filp, struct vm_area_struct *vma)
> struct trace_iterator *iter = &info->iter;
> int ret = 0;
>
> + /* Currently the boot mapped buffer is not supported for mmap */
> + if (iter->tr->flags & TRACE_ARRAY_FL_BOOT)
> + return -ENODEV;
> +
> ret = get_snapshot_map(iter->tr);
> if (ret)
> return ret;
> --
> 2.47.2
>
--
Masami Hiramatsu (Google) <mhiramat@...nel.org>
Powered by blists - more mailing lists