[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <ec5fe6b1-a116-fb60-42c6-dc8a9dedfc15@linux.intel.com>
Date: Fri, 9 Aug 2019 18:16:02 +0300
From: Alexey Budankov <alexey.budankov@...ux.intel.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>
Cc: Jiri Olsa <jolsa@...hat.com>, Namhyung Kim <namhyung@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Andi Kleen <ak@...ux.intel.com>,
Kan Liang <kan.liang@...ux.intel.com>,
"Jin, Yao" <yao.jin@...ux.intel.com>
Subject: [PATCH v1 0/3] collect LBR callstack together with thread stack data
The patch set unblocks collection of LBR call stack data simultaneously with
raw thread stack data by --call-graph dwarf,SIZE option:
$perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test
Collected LBR call stack can be used to augment dwarf call stack calculated
from the raw thread stack data and to provide more comprehensive call stack
information for cases when collected SIZE is not enough to cover complete
thread stack.
Such cases are typical for workloads that allocate large arrays of data on
its threads stacks or the possible SIZE to collect can't be large enough due
to workload nature or system configuration and this is where hardware
captured LBR call stacks can provide missing stack frames. Possible dwarf plus
LBR call stacks consolidation algorithm description follows.
With this patch set perf report command UI currently ignores collected LBR
call stack data and still provides dwarf based call stacks information.
===========================================================================
Overview:
Legend:
THS - thread stack
CTX - thread register context
SWS - software stack
SSF - skipped stack frames
PSS - Perf sample stack
ip,sp,bp - HW registers values
d - allocated stack regions
kip - ip address in the kernel space
K - captured thread stack size
THS
-----
| |<-stack bottom
...
|---|
|ip4|
|---| PSS = SWS(THS(K))
| |
--> | |
| |d3 | user/
| |---| user PSS kernel PSS
| |ip3| ------ ------
| |---| |SSF | |SSF |
| | | .... ....
| | | ------ ------
| |d2 | | -1 | | -1 |
|---| user ------ ------
K |ip2| CTX |ip3 | |ip3 |
|---| |----| |----|
| |d1 | ... |ip2 | , |ip2 |
| |---| |---| |----| |----|
| |ip1| |bp0| |ip1 | |ip1 |
| |---| |---| |----| |----|
| | | |ip0|->|ip0 | |ip0 |<-user stack top
| | | |---| ------ ------
| | |<-|sp0|<-stack |kip0|<-kernel stack bottom
--> ----- ----- top |----|
|kip1|
|----|
|kip2|
|----|
....
| |<-kernel stack top
------
Algorithm details:
Legend:
HWS - hardware stack
K-SWS - kernel software stack
BRANCH
TABLE
HWS ip ip
from to
------ -----------
|ip7`| |ip7`| |
|----| |----|----|
|ip6`| |ip6`| |
user PSS |----| |----|----|
|ip5`| |ip5`| |
------ |----| |----|----|
| -1 | |ip4`| |ip4`| |
------ |----| |----|----|
|ip3 |~~~|ip3`| |ip3`| |
|----| |----| |----|----|
|ip2 |~~~|ip2`| |ip2`| |
|----| |----| |----|----|
|ip1 |~~~|ip1`| |ip1`|ip0`|
|----| |----| -----------
|ip0 |~~~|ip0`|<---------'
------ ------
1. if (sym(ipj) == sym(ipj`)), j=0-3 ===> user PSS
2. ipj` , j=4-7 ===> user PSS
Augmented PSS = A_SWS(SWS(THS(K)), HWS):
user/
user PSS kernel PSS
------ ------
|ip7`| |ip7`|<-user PSS bottom
|----| |----|
|ip6`| |ip6`|
|----| |----|
HWS |ip5`| |ip5`|
|----| |----|
|ip4`| |ip4`|
------ ------
|ip3 | |ip3 |
|----| |----|
SWS |ip2 | |ip2 |
|----| |----|
|ip1 | |ip1 |
|----| |----|
|ip0 | |ip0 |<-user PSS top
------ ------
|kip0|<-kernel PSS bottom
|----|
|kip1|
K-SWS |----|
|kip2|
|----|
|kip3|<-kernel PSS top
------
APSS
===========================================================================
---
Alexey Budankov (3):
perf record: enable LBR callstack capture jointly with thread stack
perf report: dump LBR callstack data by -D jointly with thread stack
perf report: prefer dwarf callstacks to LBR ones when captured both
tools/perf/builtin-report.c | 2 ++
tools/perf/util/parse-branch-options.c | 1 +
tools/perf/util/session.c | 31 ++++++++++++++++----------
3 files changed, 22 insertions(+), 12 deletions(-)
--
2.20.1
Powered by blists - more mailing lists