lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 1 May 2020 00:20:18 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Joerg Roedel <jroedel@...e.de>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Borislav Petkov <bp@...en8.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Shile Zhang <shile.zhang@...ux.alibaba.com>,
        Andy Lutomirski <luto@...capital.net>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Tzvetomir Stoyanov <tz.stoyanov@...il.com>
Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before
 text_poke()

On Thu, 30 Apr 2020 22:26:55 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:

> The tracers just have to make sure they perform their vmalloc'd memory
> allocation before registering the tracepoint which can touch it, else they
> need to issue vmalloc_sync_mappings() on their own before making the
> newly allocated memory observable by instrumentation.

What gets me is that I added the patch below (which adds a
vmalloc_sync_mappings() just after the alloc_percpu()), but I also recorded
all instances of vmalloc() with a stackdump, and I get this:

          colord-1673  [002] ....    84.764804: __vmalloc_node_range+0x5/0x2c0: vmalloc called here
          colord-1673  [002] ....    84.764807: <stack trace>
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => module_alloc+0x7e/0xd0
 => bpf_jit_binary_alloc+0x70/0x110
 => bpf_int_jit_compile+0x139/0x40a
 => bpf_prog_select_runtime+0xa3/0x120
 => bpf_prepare_filter+0x533/0x5a0
 => sk_attach_filter+0x13/0x50
 => sock_setsockopt+0xd2f/0xf90
 => __sys_setsockopt+0x18a/0x1a0
 => __x64_sys_setsockopt+0x20/0x30
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0


[ the above is from before the tracing started ]

       trace-cmd-1687  [002] ....   103.908850: __vmalloc_node_range+0x5/0x2c0: vmalloc called here
       trace-cmd-1687  [002] ....   103.908856: <stack trace>
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0x23d/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
       trace-cmd-1697  [003] ....   104.088950: __vmalloc_node_range+0x5/0x2c0: vmalloc called here
       trace-cmd-1697  [003] ....   104.088954: <stack trace>
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0x23d/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
       trace-cmd-1697  [003] ....   104.089666: __vmalloc_node_range+0x5/0x2c0: vmalloc called here
       trace-cmd-1697  [003] ....   104.089669: <stack trace>
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0xc1/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
       trace-cmd-1697  [003] ....   104.098920: __vmalloc_node_range+0x5/0x2c0: vmalloc called here
       trace-cmd-1697  [003] ....   104.098924: <stack trace>
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0xc1/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
       trace-cmd-1697  [003] ....   104.114518: __vmalloc_node_range+0x5/0x2c0: vmalloc called here
       trace-cmd-1697  [003] ....   104.114520: <stack trace>
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0xc1/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
       trace-cmd-1697  [003] ....   104.130705: __vmalloc_node_range+0x5/0x2c0: vmalloc called here
       trace-cmd-1697  [003] ....   104.130707: <stack trace>
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0x23d/0x2b0
 => event_pid_write.isra.30+0x21b/0x3b0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
       trace-cmd-1687  [001] ....   106.000510: __vmalloc_node_range+0x5/0x2c0: vmalloc called here
       trace-cmd-1687  [001] ....   106.000514: <stack trace>
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0x23d/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0

The above is the calls to adding pids to set_event_pid. (I see I should
probably make that code a bit more efficient, it calls the vmalloc code a
bit too much).

But what is missing, is the call to vmalloc from alloc_percpu(). In fact, I
put in printks in the vmalloc() that's in alloc_percpu() and it doesn't
trigger from the tracing code, and it does show up in my trace from other
areas of the kernel:

     kworker/1:3-204   [001] ....    42.888340: __vmalloc_node_range+0x5/0x2c0: vmalloc called here
     kworker/1:3-204   [001] ....    42.888342: <stack trace>
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => __vmalloc+0x30/0x40
 => pcpu_create_chunk+0x77/0x220
 => pcpu_balance_workfn+0x407/0x650
 => process_one_work+0x25e/0x5c0
 => worker_thread+0x30/0x380
 => kthread+0x139/0x160
 => ret_from_fork+0x3a/0x50

So I'm still not 100% sure why the percpu data is causing a problem?

-- Steve

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8d2b98812625..10e4970a150c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8486,6 +8486,7 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size
 		return -ENOMEM;
 
 	buf->data = alloc_percpu(struct trace_array_cpu);
+	vmalloc_sync_mappings();
 	if (!buf->data) {
 		ring_buffer_free(buf->buffer);
 		buf->buffer = NULL;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9a8227afa073..489cf0620edc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2543,6 +2543,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 	void *addr;
 	unsigned long real_size = size;
 
+	trace_printk("vmalloc called here\n");
+	trace_dump_stack(0);
 	size = PAGE_ALIGN(size);
 	if (!size || (size >> PAGE_SHIFT) > totalram_pages())
 		goto fail;

Powered by blists - more mailing lists