linux-kernel - Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200430122558.406c9755@gandalf.local.home>
Date:   Thu, 30 Apr 2020 12:25:58 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Joerg Roedel <jroedel@...e.de>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Borislav Petkov <bp@...en8.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Shile Zhang <shile.zhang@...ux.alibaba.com>,
        Andy Lutomirski <luto@...capital.net>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Tzvetomir Stoyanov <tz.stoyanov@...il.com>
Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before
 text_poke()

On Thu, 30 Apr 2020 12:16:20 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:

> ----- On Apr 30, 2020, at 12:11 PM, rostedt rostedt@...dmis.org wrote:
> 
> > On Thu, 30 Apr 2020 16:11:21 +0200
> > Joerg Roedel <jroedel@...e.de> wrote:
> >   
> >> Hi,
> >> 
> >> On Wed, Apr 29, 2020 at 10:07:31AM -0400, Steven Rostedt wrote:  
> >> > Talking with Mathieu about this on IRC, he pointed out that my code does
> >> > have a vzalloc() that is called:
> >> > 
> >> > in trace_pid_write()
> >> > 
> >> > 	pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
> >> > 
> >> > This is done when -P1,2 is on the trace-cmd command line.  
> >> 
> >> Okay, tracked it down, some instrumentation in the page-fault and
> >> double-fault handler gave me the stack-traces. Here is what happens:
> >> 
> >> As already pointed out, it all happens because of page-faults on the
> >> vzalloc'ed pid bitmap. It starts with this stack-trace:
> >> 
> >>  RIP: 0010:trace_event_ignore_this_pid+0x23/0x30  
> > 
> > Interesting. Because that function is this:
> > 
> > bool trace_event_ignore_this_pid(struct trace_event_file *trace_file)
> > {
> >	struct trace_array *tr = trace_file->tr;
> >	struct trace_array_cpu *data;
> >	struct trace_pid_list *no_pid_list;
> >	struct trace_pid_list *pid_list;
> > 
> >	pid_list = rcu_dereference_raw(tr->filtered_pids);
> >	no_pid_list = rcu_dereference_raw(tr->filtered_no_pids);
> > 
> >	if (!pid_list && !no_pid_list)
> >		return false;
> > 
> >	data = this_cpu_ptr(tr->array_buffer.data);
> > 
> >	return data->ignore_pid;
> > }
> > 
> > Where it only sees if the pid masks exist. That is, it looks to see if
> > there's pointers to them, it doesn't actually touch the vmalloc'd area.
> > This check is to handle a race between allocating and deallocating the
> > buffers and setting the ignore_pid bit. The reading of these arrays is done
> > at sched_switch time, which sets or clears the ignore_pid field.
> > 
> > That said, since this only happens on buffer instances (it does not trigger
> > on the top level instance, which uses the same code for the pid masks)
> > 
> > Could this possibly be for the tr->array_buffer.data, which is allocated
> > with:
> > 
> > allocate_trace_buffer() {
> >	[..]
> >	buf->data = alloc_percpu(struct trace_array_cpu);
> > 
> > That is, the bug isn't the vmalloc being a problem, but perhaps the per_cpu
> > allocation. This would explain why this crashes with the buffer instance
> > and not with the top level instance. If it was related to the pid masks,
> > then it would trigger for either (because they act the same in allocating
> > at time of use). But when an instance is made, the tr->array_buffer.data is
> > created. Which for the top level happens at boot up and the pages would
> > have been synced long ago. But for a newly created instance, this happens
> > just before its used. This could possibly explain why it's not a problem
> > when doing it manually by hand, because the time between creating the
> > instance, and the time to start and stop the tracing, is long enough for
> > something to sync them page tables.
> > 
> > tl;dr; It's not an issue with the vmalloc, it's an issue with per_cpu
> > allocations!  
> 
> Did I mention that alloc_percpu uses:
> 
> static void *pcpu_mem_zalloc(size_t size, gfp_t gfp)
> {
>         if (WARN_ON_ONCE(!slab_is_available()))
>                 return NULL;
> 
>         if (size <= PAGE_SIZE)
>                 return kzalloc(size, gfp);
>         else
>                 return __vmalloc(size, gfp | __GFP_ZERO, PAGE_KERNEL);
> }
> 
> So yeah, it's vmalloc'd memory when size > PAGE_SIZE.
> 

I certainly hope that struct trace_array_cpu is not bigger than PAGE_SIZE!

-- Steve