lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 Oct 2021 09:09:46 -0700
From:   Namhyung Kim <namhyung@...nel.org>
To:     Leo Yan <leo.yan@...aro.org>
Cc:     German Gomez <german.gomez@....com>,
        James Clark <james.clark@....com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...hat.com>, Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Ian Rogers <irogers@...gle.com>,
        Stephane Eranian <eranian@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events

Hi Leo and German,

On Wed, Oct 6, 2021 at 2:36 AM Leo Yan <leo.yan@...aro.org> wrote:
>
> Hi German,
>
> On Tue, Oct 05, 2021 at 11:06:12AM +0100, German Gomez wrote:
>
> [...]
>
> > Yesterday we did some testing and found that there seems to be an exact
> > match between using context packets and switch events. However this only
> > applies when tracing in userspace (by adding the 'u' suffix to the perf
> > event). Otherwise we still see as much as 2% of events having the wrong
> > PID around the time of the switch.
>
> This result sounds reasonable for me, if we only trace the userspace,
> the result should have no any difference between using context packets
> and switch events.
>
> It's a bit high deviation with switch events (1.30% as shown in below
> result) after enable kernel tracing.

Yes, it's bigger than I expected, but it'd be workload specific.

>
> > In order to measure this I applied Namhyung's patch and James's patch
> > from [1]. Then added a printf line to the function arm_spe_prep_sample
> > where I have access to both PID values, as a quick way to compare them
> > later in a perf-report run. This is the diff of the printf patch:
> >
> > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> > index 41385ab96fbc..591985c66ac4 100644
> > --- a/tools/perf/util/arm-spe.c
> > +++ b/tools/perf/util/arm-spe.c
> > @@ -247,6 +247,8 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
> >     event->sample.header.type = PERF_RECORD_SAMPLE;
> >     event->sample.header.misc = sample->cpumode;
> >     event->sample.header.size = sizeof(struct perf_event_header);
> > +
> > +       printf(">>>>>> %d / %lu\n", speq->tid, record->context_id & 0x7fff);
> >  }
> >
> > The differences obtained as error % were obtained by running the
> > following perf-record commands for different configurations:
> >
> > $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/u -a -- sleep 60
> > $ sudo ./perf report --stdio \
> >     | grep ">>>>>>" \
> >     | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}'
> >
> > Error: 0% out of 11839328 samples
> >
> > $ sudo ./perf record -e arm_spe/ts_enable=1,load_filter=1,store_filter=1/ -a -- sleep 10
> > $ sudo ./perf report --stdio \
> >     | grep ">>>>>>" \
> >     | awk '{total++; if($2!=$4) miss++} END {print "Error: " (100*miss/total) "% out of " total " samples"}'
> >
> > Error: 1.30624% out of 3418731 samples
>
> Thanks for sharing this!
>
> > I think the fallback to using switch when we can't use the CONTEXTIDR
> > register is a viable option for userspace events, but maybe not so much
> > for non-userspace.
>
> Agreed.
>
> If so, it's good to check the variable
> 'evsel->core.attr.exclude_kernel' when decode Arm SPE trace data, and
> only use context switch event when 'exclude_kernel' is set.

I think it'd be better to check it in perf record and not set
evsel->core.attr.context_switch if possible.

Or it can ignore the context switch once it sees a context packet.

>
> Here should note one thing is the perf tool needs to have knowledge to
> decide if the bit 3 'CX' (macro 'SYS_PMSCR_EL1_CX_SHIFT' in kernel) has
> been set in register PMSCR or not.  AFAIK, Arm SPE driver doesn't
> expose any interface (or config) to userspace for the context tracing,
> so one method is to add an extra config in Arm SPE driver for this
> bit, e.g. 'ATTR_CFG_FLD_cx_enable_CFG' can be added in Arm SPE driver.
>
> Alternatively, rather than adding new config, I am just wandering we
> simply use two flags in perf's decoding: 'use_switch_event_for_pid' and
> 'use_ctx_packet_for_pid', the first variable will be set if detects
> the tracing is userspace only, the second varaible will be set when
> detects the hardware tracing containing context packet.  So if the
> variable 'use_ctx_packet_for_pid' has been set, then the decoder will
> always use context packet for sample's PID, otherwise, it falls back
> to check 'use_switch_event_for_pid' and set sample PID based on switch
> events.
>
> If have any other idea, please feel free bring up.

If it's just kernel config, we can check /proc/config.gz or
/boot/config-$(uname -r).  When it knows for sure it can just use
the context packet, otherwise it needs the context switch.

Thanks,
Namhyung

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ