linux-kernel - Re: [BUG] msr-trace.h:42 suspicious rcu_dereference

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161128231034.GK26852@two.firstfloor.org>
Date:   Mon, 28 Nov 2016 15:10:34 -0800
From:   Andi Kleen <andi@...stfloor.org>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Andi Kleen <andi@...stfloor.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Jiri Olsa <jolsa@...hat.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
        Josh Triplett <josh@...htriplett.org>,
        Jan Stancek <jstancek@...hat.com>
Subject: Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

> Well, actually it's been optimized a lot from the first instances.
> Note, tracepoints need to be quite generic, so where it can be optimized
> is not obvious.

There is quite a bit of stuff that doesn't change from trace point to
trace point. So you could just cache all these decisions into a single
per cpu variable, and invalidate if anything changes the conditions.

> > 
> > Functions with # of instructions:
> > 
> >     25 trace_event_raw_event_sched_switch
> 
> This is from the TRACE_EVENT macro.
> 
>         if (trace_trigger_soft_disabled(trace_file))                    \
>                 return;                                                 \

Cache?

>         if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) &&
>             trace_event_ignore_this_pid(trace_file))
>                 return NULL;
> 
>         *current_rb = trace_file->tr->trace_buffer.buffer;

Indirection could be cached.

> 
>         if ((trace_file->flags &
>              (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
>             (entry = this_cpu_read(trace_buffered_event))) {

Multiple tests of trace_file->flags in different functions, 
could be all folded into a single test.

> 
> 
> Note: Some of these if statements can be encapsulated with jump labels, as they
> are false pretty much all of the time.
> 
> 
>                 /* Try to use the per cpu buffer first */
>                 val = this_cpu_inc_return(trace_buffered_event_cnt);
>                 if (val == 1) {
>                         trace_event_setup(entry, type, flags, pc);
>                         entry->array[0] = len;
>                         return entry;
>                 }
>                 this_cpu_dec(trace_buffered_event_cnt);

This can be pre cached in a fast path. If true just point per cpu
variable directly to buffer and invalidate if buffer changes or is full.

>         preempt_disable_notrace();
> 
>         if (unlikely(atomic_read(&buffer->record_disabled)))
>                 goto out;

Cache. 
> 
>         cpu = raw_smp_processor_id();
> 
>         if (unlikely(!cpumask_test_cpu(cpu, buffer->cpumask)))
>                 goto out;

Not needed for per cpu buffers in a fast path.

> 
>         cpu_buffer = buffer->buffers[cpu];
> 
>         if (unlikely(atomic_read(&cpu_buffer->record_disabled)))
>                 goto out;
cache

> 
>         if (unlikely(length > BUF_MAX_DATA_SIZE))
>                 goto out;
> 
>         if (unlikely(trace_recursive_lock(cpu_buffer)))
>                 goto out;

locking should invalidate state on this cpu, so shouldn't be needed

>         u64 diff;
> 
>         rb_start_commit(cpu_buffer);
> 
> #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
>         /*
>          * Due to the ability to swap a cpu buffer from a buffer
>          * it is possible it was swapped before we committed.
>          * (committing stops a swap). We check for it here and
>          * if it happened, we have to fail the write.
>          */
>         barrier();
>         if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
>                 local_dec(&cpu_buffer->committing);
>                 local_dec(&cpu_buffer->commits);
>                 return NULL;
>         }

swapping should invalidate state centrally (and then run slow path like
current path)

>         /*
>          * We allow for interrupts to reenter here and do a trace.
>          * If one does, it will cause this original code to loop
>          * back here. Even with heavy interrupts happening, this
>          * should only happen a few times in a row. If this happens
>          * 1000 times in a row, there must be either an interrupt
>          * storm or we have something buggy.
>          * Bail!
>          */
>         if (RB_WARN_ON(cpu_buffer, ++nr_loops > 1000))
>                 goto out_fail;

didn't you disable interrupts earlier?

>         /* Did the write stamp get updated already? */
>         if (likely(info.ts >= cpu_buffer->write_stamp)) {
>                 info.delta = diff;
>                 if (unlikely(test_time_stamp(info.delta)))
>                         rb_handle_timestamp(cpu_buffer, &info);
>         }

Not sure why a write stamp is needed. isn't that the last few entries?

...

-Andi