[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160405120626.GM3448@twins.programming.kicks-ass.net>
Date: Tue, 5 Apr 2016 14:06:26 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Alexei Starovoitov <ast@...com>
Cc: Steven Rostedt <rostedt@...dmis.org>,
"David S . Miller" <davem@...emloft.net>,
Ingo Molnar <mingo@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
Wang Nan <wangnan0@...wei.com>, Josef Bacik <jbacik@...com>,
Brendan Gregg <brendan.d.gregg@...il.com>,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
kernel-team@...com
Subject: Re: [PATCH net-next 1/8] perf: optimize perf_fetch_caller_regs
On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote:
> avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints.
> It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call
> with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc
Its not actually allocated; but because its a static uninitialized
variable we get .bss like behaviour and the initial value is copied to
all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC.
> and
> subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs,
> so we can safely drop memset from all of the above cases and
Indeed.
> move it into
> perf_ftrace_function_call that calls it with stack allocated pt_regs.
Hmm, is there a reason that's still on-stack instead of using the
per-cpu thing, Steve?
> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
In any case,
Acked-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Powered by blists - more mailing lists