lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87lhvft6vm.fsf@sejong.aot.lge.com>
Date:	Wed, 09 Apr 2014 14:21:49 +0900
From:	Namhyung Kim <namhyung@...il.com>
To:	Don Zickus <dzickus@...hat.com>
Cc:	acme@...stprotocols.net, LKML <linux-kernel@...r.kernel.org>,
	jolsa@...hat.com, jmario@...hat.com, fowles@...each.com,
	peterz@...radead.org, eranian@...gle.com, andi.kleen@...el.com
Subject: Re: [PATCH 4/6 V2] perf, sort:  Add physid sorting based on mmap2 data

On Mon, 24 Mar 2014 16:57:18 -0400, Don Zickus wrote:
> In order for the c2c tool to work correctly, it needs to properly
> sort all the records on uniquely identifiable data addresses.  These
> unique addresses are converted from virtual addresses provided by the
> hardware into a kernel address using an mmap2 record as the decoder.
>
> Once a unique address is converted, we can sort on them based on
> various rules.  Then it becomes clear which address are overlapping
> with each other across mmap regions or pid spaces.
>
> This patch just creates the rules and inserts the records into a
> sort entry for safe keeping until later patches process them.
>
> The general sorting rule is:
>
> o group cpumodes together
> o if (nonzero major/minor number - ie mmap'd areas)
>   o sort on major, minor, inode, inode generation numbers
> o else if cpumode is not kernel
>   o sort on pid
> o sort on data addresses
>
> I also hacked in the concept of 'color'.  The purpose of that bit is to
> provides hints later when processing these records that indicate a new unique
> address has been encountered.  Because later processing only checks the data
> addresses, there can be a theoretical scenario that similar sequential data
> addresses (when walking the rbtree) could be misinterpreted as overlapping
> when in fact they are not.
>
> Sample output: (perf report --stdio --physid-mode)
>
>   Overhead            Data Address            Source Address    Command:  Pid    Tid Major  Minor  Inode  Inode Gen
>   ........  ......................  ........................ ................. ..... .....  ..... ....... .........
>     18.93%  [k] 0xffffc900139c40b0  [k] igb_update_stats     kworker/0:1:  257   257     0      0       0         0
>      7.63%  [k] 0xffff88082e6cf0a8  [k] watchdog_timer_fn        swapper:    0     0     0      0       0         0
>      1.86%  [k] 0xffff88042ef94700  [k] _raw_spin_lock           swapper:    0     0     0      0       0         0
>      1.77%  [k] 0xffff8804278afa50  [k] __switch_to              swapper:    0     0     0      0       0         0
>
> V4: add manpage entry in perf-report
>
> V3: split out the sorting into unique entries.  This makes it look
>       far less ugly
>     create a new 'physid mode' to group all the sorting rules together
>       (mimics the mem-mode)

What is 'physid' then?  I guess you meant physical id but it seems
unique id or unique map id looks like a better fit IMHO.

>
> Signed-off-by: Don Zickus <dzickus@...hat.com>
> ---
>  tools/perf/Documentation/perf-report.txt |  23 +++
>  tools/perf/builtin-report.c              |  20 ++-
>  tools/perf/util/hist.c                   |  27 ++-
>  tools/perf/util/hist.h                   |   8 +
>  tools/perf/util/sort.c                   | 294 +++++++++++++++++++++++++++++++
>  tools/perf/util/sort.h                   |  13 ++
>  6 files changed, 381 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
> index 8eab8a4..01391b0 100644
> --- a/tools/perf/Documentation/perf-report.txt
> +++ b/tools/perf/Documentation/perf-report.txt
> @@ -95,6 +95,23 @@ OPTIONS
>  	And default sort keys are changed to comm, dso_from, symbol_from, dso_to
>  	and symbol_to, see '--branch-stack'.
>  
> +	If --physid-mode option is used, following sort keys are also
> +	available:
> +	daddr, iaddr, pid, tid, major, minor, inode, inode_gen.
> +
> +	- daddr: data address (sorted based on major, minor, inode and inode
> +		generation numbers if shared, otherwise pid)

By "if shared", did you mean "for shared file mapping"?


> +	- iaddr: instruction address
> +	- pid: command and pid of the task
> +	- tid: tid of the task
> +	- major: major number of mapped location (0 if not mapped)
> +	- minor: minor number of mapped location (0 if not mapped)
> +	- inode: inode number of mapped location (0 if not mapped)
> +	- inode_gen: inode generation number of mapped location (0 if not mapped)

s/if not mapped/if not file-mapped/ ?

> +
> +	And default sort keys are changed to daddr, iaddr, pid, tid, major,
> +	minor, inode and inode_gen, see '--physid-mode'.
> +
>  -p::
>  --parent=<regex>::
>          A regex filter to identify parent. The parent is a caller of this
> @@ -223,6 +240,12 @@ OPTIONS
>  	branch stacks and it will automatically switch to the branch view mode,
>  	unless --no-branch-stack is used.
>  
> +--physid-mode::
> +	Use the data addresses sampled using perf record -d and combine them
> +	with the mmap'd area region where they are located.  This helps identify
> +	which data addresses collide with similar addresses in another process
> +	space.  See --sort for output choices.
> +
>  --objdump=<path>::
>          Path to objdump binary.
>  
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index c87412b..093f5ad 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -49,6 +49,7 @@ struct report {
>  	bool			show_threads;
>  	bool			inverted_callchain;
>  	bool			mem_mode;
> +	bool			physid_mode;
>  	bool			header;
>  	bool			header_only;
>  	int			max_stack;
> @@ -241,7 +242,7 @@ static int process_sample_event(struct perf_tool *tool,
>  		ret = report__add_branch_hist_entry(rep, &al, sample, evsel);
>  		if (ret < 0)
>  			pr_debug("problem adding lbr entry, skipping event\n");
> -	} else if (rep->mem_mode == 1) {
> +	} else if ((rep->mem_mode == 1) || (rep->physid_mode)) {

As you can see rep->mem_mode is also a boolean field.  Please change it
like:

	} else if (rep->mem_mode || rep->physid_mode) {


>  		ret = report__add_mem_hist_entry(rep, &al, sample, evsel);
>  		if (ret < 0)
>  			pr_debug("problem adding mem entry, skipping event\n");
> @@ -746,6 +747,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>  	OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
>  		    "Disable symbol demangling"),
>  	OPT_BOOLEAN(0, "mem-mode", &report.mem_mode, "mem access profile"),
> +	OPT_BOOLEAN(0, "physid-mode", &report.physid_mode, "physid access profile"),
>  	OPT_CALLBACK(0, "percent-limit", &report, "percent",
>  		     "Don't show entries under that percent", parse_percent_limit),
>  	OPT_END()
> @@ -817,6 +819,22 @@ repeat:
>  			sort_order = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked";
>  	}
>  
> +	if (report.physid_mode) {
> +		if ((sort__mode == SORT_MODE__BRANCH) ||
> +		    (sort__mode == SORT_MODE__MEMORY)) {
> +			pr_err("branch or memory and physid mode incompatible\n");
> +			goto error;
> +		}
> +		sort__mode = SORT_MODE__PHYSID;
> +
> +		/*
> +		 * if no sort_order is provided, then specify
> +		 * branch-mode specific order

s/branch-mode/physid-mode/

It looks mem-mode has the same copy-n-paste problem.


> +		 */
> +		if (sort_order == default_sort_order)
> +			sort_order = "daddr,iaddr,pid,tid,major,minor,inode,inode_gen";

So if the 'daddr' key already checks major, minor, inode and inode_gen
by itself why do we need to add those sort keys again?


> +	}
> +
>  	if (setup_sorting() < 0) {
>  		parse_options_usage(report_usage, options, "s", 1);
>  		goto error;


[SNIP]
> +static int64_t
> +sort__physid_daddr_cmp(struct hist_entry *left, struct hist_entry *right)
> +{
> +	u64 l, r;
> +	struct map *l_map = left->mem_info->daddr.map;
> +	struct map *r_map = right->mem_info->daddr.map;
> +
> +	/* store all NULL mem maps at the bottom */
> +	/* shouldn't even need this check, should have stubs */
> +	if (!left->mem_info->daddr.map || !right->mem_info->daddr.map)
> +		return 1;

You might want to use 'return cmp_null(l_map, r_map);' here.

> +
> +	/* group event types together */
> +	if (left->cpumode > right->cpumode) return -1;
> +	if (left->cpumode < right->cpumode) return 1;
> +
> +	/*
> +	 * Addresses with no major/minor numbers are assumed to be
> +	 * anonymous in userspace.  Sort those on pid then address.
> +	 *
> +	 * The kernel and non-zero major/minor mapped areas are
> +	 * assumed to be unity mapped.  Sort those on address then pid.
> +	 */
> +
> +	if (l_map->maj || l_map->min || l_map->ino || l_map->ino_generation) {
> +		/* mmapped areas */
> +
> +		if (l_map->maj > r_map->maj) return -1;
> +		if (l_map->maj < r_map->maj) return 1;
> +
> +		if (l_map->min > r_map->min) return -1;
> +		if (l_map->min < r_map->min) return 1;
> +
> +		if (l_map->ino > r_map->ino) return -1;
> +		if (l_map->ino < r_map->ino) return 1;
> +
> +		if (l_map->ino_generation > r_map->ino_generation) return -1;
> +		if (l_map->ino_generation < r_map->ino_generation) return 1;
> +
> +	} else if (left->cpumode != PERF_RECORD_MISC_KERNEL) {
> +		/* userspace anonymous */
> +		if (left->thread->pid_ > right->thread->pid_) return -1;
> +		if (left->thread->pid_ < right->thread->pid_) return 1;
> +	}
> +
> +	/* hack to mark similar regions, 'right' is new entry */
> +	right->color = TRUE;

I don't understand the logic behind the 'color'.  It seems just mark
every samples except first one on a same file (or same pid for anon map)
indicating that those accesses are for distinct maps, right?

I don't know how it could help to distinguish whether an access is for a
same map or different map.  For the userspace anon map case, why doesn't
it check the start addresses of l_map and r_map?

I'm feeling ignorant.. :-(

> +
> +	/* al_addr does all the right addr - start + offset calculations */
> +	l = left->mem_info->daddr.al_addr;
> +	r = right->mem_info->daddr.al_addr;
> +
> +	if (l > r) return -1;
> +	if (l < r) return 1;
> +
> +	/* sanity check the maps; only mmaped areas should have different maps */
> +	if ((left->mem_info->daddr.map != right->mem_info->daddr.map) &&
> +	     !right->mem_info->daddr.map->maj && !right->mem_info->daddr.map->min)
> +		pr_debug("physid_cmp: Similar entries have different maps\n");
> +
> +	return 0;
> +}


[SNIP]
> @@ -1104,6 +1382,22 @@ int sort_dimension__add(const char *tok)
>  		return 0;
>  	}
>  
> +	for (i = 0; i < ARRAY_SIZE(physid_sort_dimensions); i++) {
> +		struct sort_dimension *sd = &physid_sort_dimensions[i];
> +
> +		if (strncasecmp(tok, sd->name, strlen(tok)))
> +			continue;
> +
> +		if (sort__mode != SORT_MODE__PHYSID)
> +			return -EINVAL;
> +
> +		if (sd->entry == &sort_physid_daddr)
> +			sort__has_sym = 1;

I think it's not needed.  The sort__has_sym is for doing annotate during
report/top session and it only works for symbol (i.e. function) basis.

Thanks,
Namhyung

> +
> +		__sort_dimension__add(sd, i + __SORT_PHYSID_MODE);
> +		return 0;
> +	}
> +
>  	return -ESRCH;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ