lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 8 Jun 2022 17:07:48 -0700
From:   Ian Rogers <irogers@...gle.com>
To:     Rob Herring <robh@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Kajol Jain <kjain@...ux.ibm.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        linux-perf-users <linux-perf-users@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH 2/4] perf: Align user space counter reading with code

On Wed, Jun 8, 2022 at 4:25 PM Rob Herring <robh@...nel.org> wrote:
>
> On Wed, Jun 8, 2022 at 4:44 PM Ian Rogers <irogers@...gle.com> wrote:
> >
> > Align the user space counter reading documentation with the code in
> > perf_mmap__read_self. Previously the documentation was based on the perf
> > rdpmc test, but now general purpose code is provided by libperf.
>
> IMO, this copy of not quite right code should just be removed perhaps
> with a pointer to perf_mmap__read_self(). It will just get out of date
> again. For example, the read loop might get rewritten with restartable
> sequences.

Thanks. The intent with the rdpmc test and the header is they be in
sync. The test flakes when testing at scale, why I'm here. Having the
code in the header makes it clear this is a Linux API and subject to
the Linux syscall note. I like the code in the header but not that it
is out of sync with the code used currently to read it.

> > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > ---
> >  include/uapi/linux/perf_event.h       | 32 ++++++++++++++++-----------
> >  tools/include/uapi/linux/perf_event.h | 32 ++++++++++++++++-----------
> >  2 files changed, 38 insertions(+), 26 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index d37629dbad72..3b84e0ad0723 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -538,9 +538,13 @@ struct perf_event_mmap_page {
> >          *
> >          *     if (pc->cap_usr_time && enabled != running) {
> >          *       cyc = rdtsc();
>
> Kind of x86 specific.

There is a reference to it below and so I let it be, the name rdpmc is
also very x86 but something of a sailed ship.

> > -        *       time_offset = pc->time_offset;
> >          *       time_mult   = pc->time_mult;
> >          *       time_shift  = pc->time_shift;
> > +        *       time_offset = pc->time_offset;
> > +        *       if (pc->cap_user_time_short) {
> > +        *         time_cycles = pc->time_cycles;
> > +        *         time_mask = pc->time_mask;
> > +        *       }
>
> This still misses the need for READ_ONCE().

I'd elided that as it is something of.a kernel API. I can add it back.

> >          *     }
> >          *
> >          *     index = pc->index;
> > @@ -548,6 +552,9 @@ struct perf_event_mmap_page {
> >          *     if (pc->cap_user_rdpmc && index) {
> >          *       width = pc->pmc_width;
> >          *       pmc = rdpmc(index - 1);
> > +        *       pmc <<= 64 - width;
> > +        *       pmc >>= 64 - width;
> > +        *       count += pmc;
> >          *     }
> >          *
> >          *     barrier();
> > @@ -590,25 +597,24 @@ struct perf_event_mmap_page {
> >          * If cap_usr_time the below fields can be used to compute the time
> >          * delta since time_enabled (in ns) using rdtsc or similar.
> >          *
> > -        *   u64 quot, rem;
> > -        *   u64 delta;
> > -        *
> > -        *   quot = (cyc >> time_shift);
> > -        *   rem = cyc & (((u64)1 << time_shift) - 1);
> > -        *   delta = time_offset + quot * time_mult +
> > -        *              ((rem * time_mult) >> time_shift);
> > +        *   cyc = time_cycles + ((cyc - time_cycles) & time_mask);
> > +        *   delta = time_offset + mul_u64_u32_shr(cyc, time_mult, time_shift);
>
> For documentation purposes, the original code was easier to read and
> this is just an optimization. What does mul_u64_u32_shr() do exactly?
> It's not documented.

My concern with expanding the function is the header and the code
aren't in sync and so we're not testing what we're advertising.
Finding the definition is not a huge challenge imo.

Thanks,
Ian

> >          *
> >          * Where time_offset,time_mult,time_shift and cyc are read in the
> >          * seqcount loop described above. This delta can then be added to
> > -        * enabled and possible running (if index), improving the scaling:
> > +        * enabled and possible running (if index) to improve the scaling. Due
> > +        * to event multiplexing, running maybe zero and so care is needed to
> > +        * avoid division by zero.
> >          *
> >          *   enabled += delta;
> > -        *   if (index)
> > +        *   if (idx)
> >          *     running += delta;
> >          *
> > -        *   quot = count / running;
> > -        *   rem  = count % running;
> > -        *   count = quot * enabled + (rem * enabled) / running;
> > +        *   if (running != 0) {
> > +        *     quot = count / running;
> > +        *     rem  = count % running;
> > +        *     count = quot * enabled + (rem * enabled) / running;
> > +        *   }
> >          */
> >         __u16   time_shift;
> >         __u32   time_mult;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ