lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <586de5fc-858b-2693-1986-5c77e8c0e3d0@intel.com>
Date:   Fri, 11 Mar 2022 11:10:30 +0200
From:   Adrian Hunter <adrian.hunter@...el.com>
To:     "Steinar H. Gunderson" <sesse@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>
Cc:     Jiri Olsa <jolsa@...hat.com>, Namhyung Kim <namhyung@...nel.org>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] perf intel-pt: Synthesize cycle events

On 10.3.2022 11.38, Steinar H. Gunderson wrote:
> There is no good reason why we cannot synthesize "cycle" events
> from Intel PT just as we can synthesize "instruction" events,
> in particular when CYC packets are available. This enables using
> PT to getting much more accurate cycle profiles than regular sampling
> (record -e cycles) when the work last for very short periods (<10 ms).
> Thus, add support for this, based off of the existing IPC calculation
> framework. The new option to --itrace is "y" (for cYcles), as c was
> taken for calls. Cycle and instruction events can be synthesized
> together, and are by default.
> 
> The only real caveat is that CYC packets are only emitted whenever
> some other packet is, which in practice is when a branch instruction
> is encountered. Thus, even at no subsampling (e.g. --itrace=y0ns),
> it is impossible to get more accuracy than a single basic block, and all
> cycles spent executing that block will get attributed to the branch
> instruction that ends it. Thus, one cannot know whether the cycles came
> from e.g. a specific load, a mispredicted branch, or something else.
> When subsampling (which is the default), the cycle events will get
> smeared out even more, but will still be useful to attribute cycle
> counts to functions.
> 
> Signed-off-by: Steinar H. Gunderson <sesse@...gle.com>
> ---
>  tools/perf/Documentation/itrace.txt        |  3 +-
>  tools/perf/Documentation/perf-intel-pt.txt | 33 +++++++-----
>  tools/perf/util/auxtrace.c                 |  9 +++-
>  tools/perf/util/auxtrace.h                 |  7 ++-
>  tools/perf/util/intel-pt.c                 | 59 +++++++++++++++++++---
>  5 files changed, 88 insertions(+), 23 deletions(-)
> 

<SNIP>

> -static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
> +static int intel_pt_synth_instruction_or_cycle_sample(struct intel_pt_queue *ptq)
>  {
>  	struct intel_pt *pt = ptq->pt;
>  	union perf_event *event = ptq->event_buf;
>  	struct perf_sample sample = { .ip = 0, };
> +	int err;
>  
>  	if (intel_pt_skip_event(pt))
>  		return 0;
> @@ -1633,7 +1639,7 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
>  	else
>  		sample.period = ptq->state->tot_insn_cnt - ptq->last_insn_cnt;
>  
> -	if (ptq->sample_ipc)
> +	if (ptq->sample_ipc || pt->sample_cycles)

This is not quite right.  ptq->sample_ipc is set to indicate when the
cycle count is accurate for the current instruction.  It can be weakened
by using "Approx IPC" which was introduced for dlfilter-show-cycles.
Probably that approach should be followed for a "cycles" event also.

>From perf-intel-pt man page:

dlfilter-show-cycles.so
~~~~~~~~~~~~~~~~~~~~~~~

Cycles can be displayed using dlfilter-show-cycles.so in which case the itrace A
option can be useful to provide higher granularity cycle information:

	perf script --itrace=A --call-trace --dlfilter dlfilter-show-cycles.so

To see a list of dlfilters:

	perf script -v --list-dlfilters

See also linkperf:perf-dlfilters[1]


>  		sample.cyc_cnt = ptq->ipc_cyc_cnt - ptq->last_in_cyc_cnt;
>  	if (sample.cyc_cnt) {
>  		sample.insn_cnt = ptq->ipc_insn_cnt - ptq->last_in_insn_cnt;
> @@ -1643,8 +1649,30 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
>  
>  	ptq->last_insn_cnt = ptq->state->tot_insn_cnt;

There are variables here that are specific to the "instructions" event, so
mixing "cycles" with "instructions" means duplicating those, however maybe
it would be better not to allow "y" and "i" options at the same time?

>  
> -	return intel_pt_deliver_synth_event(pt, event, &sample,
> -					    pt->instructions_sample_type);
> +	if (pt->sample_instructions) {
> +		err = intel_pt_deliver_synth_event(pt, event, &sample,
> +						   pt->instructions_sample_type);
> +		if (err)
> +			return err;
> +	}
> +
> +	/*
> +	 * NOTE: If not doing sampling (e.g. itrace=y0us), we will in practice
> +	 * only see cycles being attributed to branches, since CYC packets
> +	 * only are emitted together with other packets are emitted.
> +	 * We should perhaps consider spreading it out over everything since
> +	 * the last CYC packet, ie., since last time sample.cyc_cnt was nonzero.
> +	 */
> +	if (pt->sample_cycles && sample.cyc_cnt) {
> +		sample.id = ptq->pt->cycles_id;
> +		sample.stream_id = ptq->pt->cycles_id;

A "cycles" sample needs to set the sample period to the number of cycles since the
last "cycles" sample.

> +		err = intel_pt_deliver_synth_event(pt, event, &sample,
> +						   pt->cycles_sample_type);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
>  }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ