lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87pmtshvk5.fsf@stepbren-lnx.us.oracle.com>
Date:   Wed, 01 Sep 2021 13:47:54 -0700
From:   Stephen Brennan <stephen.s.brennan@...cle.com>
To:     Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:     Jiri Olsa <jolsa@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Namhyung Kim <namhyung@...nel.org>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] perf script python: Allow reporting [un]throttle

Arnaldo Carvalho de Melo <acme@...nel.org> writes:
> Em Tue, Aug 31, 2021 at 02:47:47PM -0700, Stephen Brennan escreveu:
>> On 8/31/21 11:46 AM, Jiri Olsa wrote:
>> > On Mon, Aug 16, 2021 at 05:21:33PM -0700, Stephen Brennan wrote:
>> > > perf_events may sometimes throttle an event due to creating too many
>> > > samples during a given timer tick. As of now, the perf tool will not
>> > > report on throttling, which means this is a silent error. Implement a
>> > > callback for the throttle and unthrottle events within the Python
>> > > scripting engine, which can allow scripts to detect and report when
>> > > events may have been lost due to throttling.
>
>> > > A python script could simply define throttle() and unthrottle()
>> > > functions to begin receiving them, e.g.:
>
>> > > ```
>> > > from __future__ import print_function
>
>> > > def process_event(param_dict):
>> > >      print("event cpu={} time={}".format(
>> > >          param_dict["sample"]["cpu"], param_dict["sample"]["time"]))
>
>> > > def throttle(*args):
>> > >      print("throttle(time={}, cpu={}, pid={}, tid={})".format(*args))
>
>> > > def unthrottle(*args):
>> > >      print("unthrottle(time={}, cpu={}, pid={}, tid={})".format(*args))
>> > > ```
>
>> > throttle event has also 'id' and 'stream_id' I guess you don't
>> > need it, but maybe we should add it to be complete
>
>> I tried adding these from struct perf_sample.id and struct
>> perf_sample.sample_id respectively. I then tested these on a perf.data which
>> contains a throttle event. The values which my python script reported were
>> 0xFFFFFFFFFFFFFFFF -- is this expected? If so, I'll send my revised patch.
>
> So, perf_sample.X is only defined if PERF_SAMPLE_X is in
> perf_event_attr.sample_type, and for the perf.data file below it is:
>
>   sample_type = IP|TID|TIME
>
> So only perf_sample.ip, perf_sample.tid and perf_sample.time are set,
> not perf_sample.id nor perf_sample.stream_id.
>
> What Jiri suggests is to use the _fixed_ payload for
> PERF_SAMPLE_THROTTLE/UNTHROTTLE, which is, according to
> include/uapi/linux/perf_event.h:
>
>          * struct {
>          *      struct perf_event_header        header;
>          *      u64                             time;          <<<<
>          *      u64                             id;            <<<<
>          *      u64                             stream_id;     <<<<
>          *      struct sample_id                sample_id;
>          * };
>          */
>         PERF_RECORD_THROTTLE                    = 5,
>         PERF_RECORD_UNTHROTTLE                  = 6,
>  
>
> The 'time', 'id' and 'stream_id' should always be there, while what is
> in 'sample_id' is documented in the same file at the beginning of the
> 'enum perf_event_type' definition:
>
>          * If perf_event_attr.sample_id_all is set then all event types will
>          * have the sample_type selected fields related to where/when
>          * (identity) an event took place (TID, TIME, ID, STREAM_ID, CPU,
>          * IDENTIFIER) described in PERF_RECORD_SAMPLE below, it will be stashed
>          * just after the perf_event_header and the fields already present for
>          * the existing fields, i.e. at the end of the payload. That way a newer
>          * perf.data file will be supported by older perf tools, with these new
>          * optional fields being ignored.
>          *
>          * struct sample_id {
>          *      { u32                   pid, tid; } && PERF_SAMPLE_TID
>          *      { u64                   time;     } && PERF_SAMPLE_TIME
>          *      { u64                   id;       } && PERF_SAMPLE_ID
>          *      { u64                   stream_id;} && PERF_SAMPLE_STREAM_ID
>          *      { u32                   cpu, res; } && PERF_SAMPLE_CPU
>          *      { u64                   id;       } && PERF_SAMPLE_IDENTIFIER
>          * } && perf_event_attr::sample_id_all
>          *
>          * Note that PERF_SAMPLE_IDENTIFIER duplicates PERF_SAMPLE_ID.  The
>          * advantage of PERF_SAMPLE_IDENTIFIER is that its position is fixed
>          * relative to header.size.
>
> There are artifacts here from the evolution of the perf_event subsystem,
> i.e. throttle.id is equal to throttle.sample-id.id when PERF_SAMPLE_ID
> is set in perf_event_attr.sample_type, ditto for 'time' and 'stream_id'.
> entries.
>
> But since 'time', 'id' and 'stream_id' are _always_ set in the
> PERF_RECORD_THROTTLE event, we can as well make them available via perf
> script python.
>
> Take a look at ./kernel/events/core.c, function perf_log_throttle(), the
> only one in the kernel that emits the PERF_RECORD_THROTTLE meta event
> and see that it always set the 'time', 'id' and 'stream_id' fields:
>
> static void perf_log_throttle(struct perf_event *event, int enable)
> {
>         struct perf_output_handle handle;
>         struct perf_sample_data sample;
>         int ret;
>
>         struct {
>                 struct perf_event_header        header;
>                 u64                             time;
>                 u64                             id;
>                 u64                             stream_id;
>         } throttle_event = {
>                 .header = {
>                         .type = PERF_RECORD_THROTTLE,
>                         .misc = 0,
>                         .size = sizeof(throttle_event),
>                 },
>                 .time           = perf_event_clock(event),
>                 .id             = primary_event_id(event),
>                 .stream_id      = event->id,
>         };
>
>         if (enable)
>                 throttle_event.header.type = PERF_RECORD_UNTHROTTLE;
>
>         perf_event_header__init_id(&throttle_event.header, &sample, event);
>
>         ret = perf_output_begin(&handle, &sample, event,
>                                 throttle_event.header.size);
>         if (ret)
>                 return;
>
>         perf_output_put(&handle, throttle_event);
>         perf_event__output_id_sample(event, &handle, &sample);
>         perf_output_end(&handle);
> }
>
> So instead of doing:
>
> +       tuple_set_u64(t, 0, sample->time);
> +       tuple_set_s32(t, 1, sample->cpu);
> +       tuple_set_s32(t, 2, sample->pid);
> +       tuple_set_s32(t, 3, sample->tid);
>
> That may not be set at all, you should do:
>
> +       tuple_set_u64(t, 0, event->throttle.time);
> +       tuple_set_u64(t, 1, event->throttle.id);
> +       tuple_set_u64(t, 2, event->throttle.stream_id);
>
> These can be left as is with 0xFFFFFFFFFFFFFFFF meaning its not set:
>
> +       tuple_set_s32(t, 3, sample->cpu);
> +       tuple_set_s32(t, 4, sample->pid);
> +       tuple_set_s32(t, 5, sample->tid);
>
> to confirm the user can look at perf_event_attr.sample_type |
> PERF_SAMPLE_X.

Thank you so much for this detailed explanation! I'll send v2 as you
described. This definitely helps me understand the data structures at
play much more.

Thanks,
Stephen

>
> - Arnaldo
>
>> Thanks,
>> Stephen
>> 
>> `perf report --header-only` from my testing perf.data:
>> 
>> # ========
>> # captured on    : Wed Aug 11 11:55:44 2021
>> # header version : 1
>> # data offset    : 264
>> # data size      : 18240
>> # feat offset    : 18504
>> # hostname : stepbren-ol7-2
>> # os release : 5.4.17-2102.203.6.el7uek.x86_64
>> # perf version : 5.4.17-2102.203.6.el7uek.x86_64
>> # arch : x86_64
>> # nrcpus online : 4
>> # nrcpus avail : 4
>> # cpudesc : Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz
>> # cpuid : GenuineIntel,6,85,4
>> # total memory : 30522624 kB
>> # cmdline : /usr/libexec/perf.5.4.17-2102.203.6.el7uek.x86_64 record -c
>> 100000 -e cycles -- sh -c echo 2000 > /proc/irqoff_sleep_millis
>> # event : name = cycles, , id = { 5, 6, 7, 8 }, size = 112, { sample_period,
>> sample_freq } = 100000, sample_type = IP|TID|TIME, read_format = ID,
>> disabled = 1, inherit = 1, mmap = 1, comm = 1, enable_on_exec = 1, task = 1,
>> sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1,
>> bpf_event = 1
>> # CPU_TOPOLOGY info available, use -I to display
>> # NUMA_TOPOLOGY info available, use -I to display
>> # pmu mappings: software = 1, uprobe = 7, cpu = 4, breakpoint = 5,
>> tracepoint = 2, kprobe = 6, msr = 8
>> # CACHE info available, use -I to display
>> # time of first sample : 228.321751
>> # time of last sample : 230.362698
>> # sample duration :   2040.947 ms
>> # MEM_TOPOLOGY info available, use -I to display
>> # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT
>> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
>> HYBRID_CPU_PMU_CAPS
>> # ========

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ