lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ea9a04ad-26b2-7072-9f45-9ddbd8f61c10@intel.com>
Date:   Mon, 28 Jun 2021 10:23:18 +0300
From:   Adrian Hunter <adrian.hunter@...el.com>
To:     Andi Kleen <ak@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:     Jiri Olsa <jolsa@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Namhyung Kim <namhyung@...nel.org>,
        Leo Yan <leo.yan@...aro.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 00/10] perf script: Add API for filtering via
 dynamically loaded shared object

On 27/06/21 7:13 pm, Andi Kleen wrote:
> 
> On 6/27/2021 6:18 AM, Adrian Hunter wrote:
>> Hi In some cases, users want to filter very large amounts of data
>> (e.g. from AUX area tracing like Intel PT) looking for something
>> specific. While scripting such as Python can be used, Python is 10
>> to 20 times slower than C. So define a C API so that custom filters
>> can be written and loaded.
> 
> While I appreciate this for complex cases, in my experience filtering
> is usually just a simple expression. It would be nice to also have a
> way to do this reasonably fast without having to write a custom C

I do not agree that writing C filters is a hassle e.g. a minimal do-nothing
filter is only a few lines:

#include <perf/perf_dlfilter.h>

int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
{
	return 0;
}

(Actually, the filter program does not have to have any LOC at all, but that
is not much of an example)

Additionally, a script to do the build is fairly trivial e.g. I use this:

$ cat `which make-dlfilter.sh `
#!/bin/bash

set -ex

if test -z "${1}" ; then
        echo "Name required"
        exit 1
fi

name="${1%.c}"

if test "${name}" = "${1}" ; then
        name="${1%.so}"
fi

gcc -c -I ~/include -fpic "${name}.c"

gcc -shared -o "${name}.so" "${name}.o"


> file.   Is the 10x-20x overhead just the python interpreter, or is it
> related to perf?


AFAICT the Python C API used to interface to Python performs fairly similarly
to the Python interpreter.

>                  Maybe we could have some kind of python fast path
> just for filters?

I expect there are ways to make it more efficient, but I doubt it would ever
come close to C.

> just for filters? Or maybe the alternative would be to have a
> frontend in perf that can automatically generate/compile such a C
> filter based on a simple expression, but I'm not sure if that would
> be much simpler.

If gcc is available, perf script could, in fact, build the .so on the fly
since the compile time is very quick.

Another point is that filters can be used for more than just filtering.
Here is an example which sums cycles per-cpu and prints them, and the difference
to the last print, at the beginning of each line.  I think this was something
you were interested in doing?


#include <perf/perf_dlfilter.h>
#include <stdio.h>

#define MAX_CPU 4096

__u64 cycles[MAX_CPU];
__u64 cycles_rpt[MAX_CPU];

int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
{
	__s32 cpu = sample->cpu;

	if (cpu >=0 && cpu < MAX_CPU)
		cycles[cpu] += sample->cyc_cnt;
	return 0;
}

int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
{
	__s32 cpu = sample->cpu;

	if (cpu >=0 && cpu < MAX_CPU) {
		printf("%10llu %10llu ", cycles[cpu], cycles[cpu] - cycles_rpt[cpu]);
		cycles_rpt[cpu] = cycles[cpu];
	} else {
		printf("%22s", "");
	}
	return 0;
}

const char *filter_description(const char **long_description)
{
	return "Print the number of cycles at the start of each line";
}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ