lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 23 Jan 2024 12:09:13 +0200
From: Adrian Hunter <adrian.hunter@...el.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Heiko Carstens <hca@...ux.ibm.com>, Thomas Richter <tmricht@...ux.ibm.com>,
 Hendrik Brueckner <brueckner@...ux.ibm.com>,
 Suzuki K Poulose <suzuki.poulose@....com>, Mike Leach
 <mike.leach@...aro.org>, James Clark <james.clark@....com>,
 coresight@...ts.linaro.org, linux-arm-kernel@...ts.infradead.org,
 Yicong Yang <yangyicong@...ilicon.com>,
 Jonathan Cameron <jonathan.cameron@...wei.com>, Will Deacon
 <will@...nel.org>, Arnaldo Carvalho de Melo <acme@...nel.org>,
 Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
 linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org
Subject: Re: [PATCH V4 01/11] perf/core: Add aux_pause, aux_resume,
 aux_start_paused

On 19/01/24 23:40, Namhyung Kim wrote:
> On Thu, Jan 11, 2024 at 12:19 AM Adrian Hunter <adrian.hunter@...el.com> wrote:
>>
>> Hardware traces, such as instruction traces, can produce a vast amount of
>> trace data, so being able to reduce tracing to more specific circumstances
>> can be useful.
>>
>> The ability to pause or resume tracing when another event happens, can do
>> that.
>>
>> Add ability for an event to "pause" or "resume" AUX area tracing.
>>
>> Add aux_pause bit to perf_event_attr to indicate that, if the event
>> happens, the associated AUX area tracing should be paused. Ditto
>> aux_resume. Do not allow aux_pause and aux_resume to be set together.
>>
>> Add aux_start_paused bit to perf_event_attr to indicate to an AUX area
>> event that it should start in a "paused" state.
>>
>> Add aux_paused to struct perf_event for AUX area events to keep track of
>> the "paused" state. aux_paused is initialized to aux_start_paused.
>>
>> Add PERF_EF_PAUSE and PERF_EF_RESUME modes for ->stop() and ->start()
>> callbacks. Call as needed, during __perf_event_output(). Add
>> aux_in_pause_resume to struct perf_buffer to prevent races with the NMI
>> handler. Pause/resume in NMI context will miss out if it coincides with
>> another pause/resume.
>>
>> To use aux_pause or aux_resume, an event must be in a group with the AUX
>> area event as the group leader.
>>
>> Example (requires Intel PT and tools patches also):
>>
>>  $ perf record --kcore -e intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/ uname
>>  Linux
>>  [ perf record: Woken up 1 times to write data ]
>>  [ perf record: Captured and wrote 0.043 MB perf.data ]
>>  $ perf script --call-trace
>>  uname   30805 [000] 24001.058782799: name: 0x7ffc9c1865b0
>>  uname   30805 [000] 24001.058784424:  psb offs: 0
>>  uname   30805 [000] 24001.058784424:  cbr: 39 freq: 3904 MHz (139%)
>>  uname   30805 [000] 24001.058784629: ([kernel.kallsyms])        debug_smp_processor_id
>>  uname   30805 [000] 24001.058784629: ([kernel.kallsyms])        __x64_sys_newuname
>>  uname   30805 [000] 24001.058784629: ([kernel.kallsyms])            down_read
>>  uname   30805 [000] 24001.058784629: ([kernel.kallsyms])                __cond_resched
>>  uname   30805 [000] 24001.058784629: ([kernel.kallsyms])                preempt_count_add
>>  uname   30805 [000] 24001.058784629: ([kernel.kallsyms])                    in_lock_functions
>>  uname   30805 [000] 24001.058784629: ([kernel.kallsyms])                preempt_count_sub
>>  uname   30805 [000] 24001.058784629: ([kernel.kallsyms])            up_read
>>  uname   30805 [000] 24001.058784629: ([kernel.kallsyms])                preempt_count_add
>>  uname   30805 [000] 24001.058784838: ([kernel.kallsyms])                    in_lock_functions
>>  uname   30805 [000] 24001.058784838: ([kernel.kallsyms])                preempt_count_sub
>>  uname   30805 [000] 24001.058784838: ([kernel.kallsyms])            _copy_to_user
>>  uname   30805 [000] 24001.058784838: ([kernel.kallsyms])        syscall_exit_to_user_mode
>>  uname   30805 [000] 24001.058784838: ([kernel.kallsyms])            syscall_exit_work
>>  uname   30805 [000] 24001.058784838: ([kernel.kallsyms])                perf_syscall_exit
>>  uname   30805 [000] 24001.058784838: ([kernel.kallsyms])                    debug_smp_processor_id
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                    perf_trace_buf_alloc
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                        perf_swevent_get_recursion_context
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                            debug_smp_processor_id
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                        debug_smp_processor_id
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                    perf_tp_event
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                        perf_trace_buf_update
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                            tracing_gen_ctx_irq_test
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                        perf_swevent_event
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                            __perf_event_account_interrupt
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                                __this_cpu_preempt_check
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                            perf_event_output_forward
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                                perf_event_aux_pause
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                                    ring_buffer_get
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                                        __rcu_read_lock
>>  uname   30805 [000] 24001.058785046: ([kernel.kallsyms])                                        __rcu_read_unlock
>>  uname   30805 [000] 24001.058785254: ([kernel.kallsyms])                                    pt_event_stop
>>  uname   30805 [000] 24001.058785254: ([kernel.kallsyms])                                        debug_smp_processor_id
>>  uname   30805 [000] 24001.058785254: ([kernel.kallsyms])                                        debug_smp_processor_id
>>  uname   30805 [000] 24001.058785254: ([kernel.kallsyms])                                        native_write_msr
>>  uname   30805 [000] 24001.058785463: ([kernel.kallsyms])                                        native_write_msr
>>  uname   30805 [000] 24001.058785639: 0x0
> 
> Looks great!  I think this is very similar to what Kees asked in
> 
>   https://lore.kernel.org/linux-perf-users/202401091452.B73E21B6C@keescook/

Sometimes a precisely-defined workload is needed, just so that
running it repeatedly does not produce results that vary too much
to tell whether one software version is better than another.

> 
> I have a couple of basic questions:
>  * Can we do that for regular events too?

That would be much more complicated.  The current implementation
can only pause / resume 1 event, the group leader, and it has to
be supported by the PMU callbacks.

>  * What's the difference between start/stop and pause/resume?
>    (IOW can we do that just using start/stop callbacks?)

It is using start / stop callbacks, albeit with a different mode
parameter.  However pause / resume is not allowed unless the event
has been started and not stopped, so it is a different state.

> 
> Actually I was thinking about dropping samples using a BPF filter
> outside the target scope (e.g. a syscall) but it'd be nice if we can
> have builtin support for that.

In general, I would have thought that capturing samples does not
produce so much data that it cannot be filtered in post-processing.
Looking at the email thread from above, that seems to be what
Arnaldo has proposed.

AUX area tracing is different in this regard.  Intel PT can produce
more trace data than can be written out in time, so data will be
lost for large traces.  Also post-processing takes a long time, so
less data captured helps a lot there also.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ