linux-kernel - Re: [PATCH] perf record: Allow passing perf's own pid to '--filter'

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 6 Jul 2015 10:56:50 -0300
From:	Arnaldo Carvalho de Melo <acme@...nel.org>
To:	Wang Nan <wangnan0@...wei.com>
Cc:	a.p.zijlstra@...llo.nl, mingo@...hat.com, jolsa@...nel.org,
	peterz@...radead.org, namhyung@...nel.org, kan.liang@...el.com,
	adrian.hunter@...el.com, ak@...ux.intel.com,
	cody@...ux.vnet.ibm.com, jacob.w.shin@...il.com,
	standby24x7@...il.com, lizefan@...wei.com, yunlong.song@...wei.com,
	rostedt@...dmis.org, linux-kernel@...r.kernel.org, pi3orama@....com
Subject: Re: [PATCH] perf record: Allow passing perf's own pid to '--filter'

Em Mon, Jul 06, 2015 at 04:17:31AM +0000, Wang Nan escreveu:
> This patch allows passing perf's own PID to '--filter' by using
> '@...FPID'. This should be useful when system-widely capturing
> tracepoints events.

Steven, does filters have any special meaning for @?
 
> Before this patch, when doing something like:
> 
>  # perf record -a -e syscalls:sys_enter_write <cmd>
> 
> One could easily get result like this:
> 
>  # /tmp/perf report --stdio
>  ...
>  # Overhead  Command  Shared Object       Symbol
>  # ........  .......  ..................  ....................
>  #
>      99.99%  perf     libpthread-2.18.so  [.] __write_nocancel
>      0.01%   ls       libc-2.18.so        [.] write
>      0.01%   sshd     libc-2.18.so        [.] write
>      ...
> 
> Where most events are generated by perf itself.
> 
> A shell trick can be done to filter perf itself out:
> 
>  # cat << EOF > ./tmp
>  > #!/bin/sh
>  > exec perf record -e ... --filter="common_pid != \$\$" -a sleep 10
>  > EOF
>  # chmod a+x ./tmp
>  # ./tmp
> 
> However, doing so is user unfriendly.
> 
> This patch introduces '@...FPID' placeholder to '--filter' options. Now
> user is allowed to the above work with:
> 
>   # perf record -e ... --filter="common_pid != @PERFPID' sleep 10
> 
> Signed-off-by: Wang Nan <wangnan0@...wei.com>
> ---
>  tools/perf/Documentation/perf-record.txt |  1 +

<SNIP>

> +++ b/tools/perf/util/parse-events.c
> @@ -1175,6 +1175,101 @@ int parse_events_option(const struct option *opt, const char *str,
>  	return ret;
>  }
>  
> +#ifndef PAGE_SIZE
> +# define PAGE_SIZE 4096
> +#endif

You can use 'page_size', its available and filled via:

   page_size = sysconf(_SC_PAGE_SIZE);

early in perf's main() routine.

> +static int
> +postproc_filter_append_token(const char *key, char *new_filter,
> +			     ssize_t *pspace)
> +{
> +	if (strcmp(key, "PERFPID") == 0) {
> +		char pid_buf[32];
> +		pid_t self_pid = getpid();
> +
> +		snprintf(pid_buf, 32, "%d", self_pid);

		snprintf(pid_buf, sizeof(pid_buf), "%d", self_pid);

> +		strncat(new_filter, pid_buf, *pspace);
> +		*pspace -= strlen(pid_buf);
> +		if (*pspace < 0)
> +			return -1;
> +		return 0;
> +	}
> +
> +	return -1;

but then, please take a look at my perf/core branch, by coincidence I
worked on having multiple filters set on a evsel in 'perf trace', where
at a minimum, the tools' pid is added to the filter for all tracepoints
used, i.e.:

     commom_pid != getpid()

is always present, to avoid a feedback loop, neverending tracing of
syscalls generated by the tracer itself.

Then, if you use --filter-pids PID-1,PID-2,PID3, it will create an
expression with that first part, for things like gnome-terminal, xorg,
etc.

Now we need to keep that in place and if the user uses -e to specify
which syscalls it wants (or wants filtered out), we need to again
concatenate with that commom_pid list, so that we call the filter ioctl
just once, else the kernel returns EEXIST.

Because I needed to append, etc, there are new functions there for
go on creating such expressions, please use them, its all in my
perf/core branch, the latest patches.

I.e. having something in the filter expression that gets transformed
into the tools' pid, I have no problem with that, just curious about
what would be the best character to signal that a substitution needs to
be performed, if it is really '@...', as my first selection would be
'$VAR', but then I haven't looked deeply at ftrace's filter stuff to see
if it has provision for substitution in the kernel, etc.

Andi also did this at some point, forgot why that wasn't applied at the
time :-\

For reference:

https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit/?h=perf/core&id=f47805a2af3ba83881ca52434bbbc6e9886b72fd

- Arnaldo

> +}
> +
> +static void postproc_filter(struct perf_evsel *evsel)
> +{
> +	char *at = NULL, *sep = NULL, *old_filter, *new_filter;
> +	ssize_t space;
> +
> +	if (!evsel->filter)
> +		return;
> +
> +	old_filter = evsel->filter;
> +	at = strchr(old_filter, '@');
> +	if (!at)
> +		return;
> +
> +	/*
> +	 * See perf_event_set_filter(). Length of a valid filter is
> +	 * limited by PAGE_SIZE.
> +	 */
> +	new_filter = malloc(PAGE_SIZE);
> +	if (!new_filter) {
> +		fprintf(stderr, "No enough memory for post proc filter '%s'\n",
> +			old_filter);
> +		return;
> +	}
> +	*new_filter = '\0';
> +	space = PAGE_SIZE - 1;
> +
> +	while (1) {
> +		if (at)
> +			*at = '\0';
> +		strncat(new_filter, old_filter, space);
> +		space -= strlen(old_filter);
> +		if (space < 0)
> +			goto errout;
> +		if (!at)
> +			break;
> +		*at = '@';
> +
> +		sep = strchr(at + 1, ' ');
> +		if (sep)
> +			*sep = '\0';
> +
> +		if (postproc_filter_append_token(at + 1, new_filter, &space))
> +			goto errout;
> +
> +		if (!sep)
> +			break;
> +		*sep = ' ';
> +
> +		old_filter = sep;
> +		at = strchr(old_filter, '@');
> +	}
> +
> +	free(evsel->filter);
> +	/*
> +	 * It is safe to use new_filter directly. However, try to
> +	 * release some memory by strdup() a smaller string and free
> +	 * new_filter, which takes a full page.
> +	 */
> +	evsel->filter = strdup(new_filter);
> +	if (!evsel->filter)
> +		evsel->filter = new_filter;
> +	else
> +		free(new_filter);
> +	return;
> +errout:
> +	if (at)
> +		*at = '@';
> +	if (sep)
> +		*sep = ' ';
> +	fprintf(stderr, "Can't post proc filter '%s'\n", evsel->filter);
> +	free(new_filter);
> +}
> +
>  int parse_filter(const struct option *opt, const char *str,
>  		 int unset __maybe_unused)
>  {
> @@ -1196,6 +1291,7 @@ int parse_filter(const struct option *opt, const char *str,
>  		return -1;
>  	}
>  
> +	postproc_filter(last);
>  	return 0;
>  }
>  
> -- 
> 1.8.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/