lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 19 Oct 2021 16:28:31 +0100
From:   Mark Rutland <mark.rutland@....com>
To:     Rob Herring <robh@...nel.org>
Cc:     Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...hat.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Ian Rogers <irogers@...gle.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Honnappa Nagarahalli <honnappa.nagarahalli@....com>,
        Zachary.Leaf@....com, Raphael Gault <raphael.gault@....com>,
        Jonathan Cameron <Jonathan.Cameron@...wei.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Itaru Kitayama <itaru.kitayama@...il.com>,
        Vince Weaver <vincent.weaver@...ne.edu>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        linux-perf-users@...r.kernel.org
Subject: Re: [PATCH v10 4/5] arm64: perf: Enable PMU counter userspace access
 for perf event

Hi Rob,

On Thu, Oct 14, 2021 at 02:24:46PM -0500, Rob Herring wrote:
> On Thu, Oct 14, 2021 at 11:58 AM Mark Rutland <mark.rutland@....com> wrote:
> > On Tue, Sep 14, 2021 at 03:47:59PM -0500, Rob Herring wrote:
> > For the `config1 = 3` case (potentially) overriding the usual long
> > semantic, I'm struggling to understand why we need that rather than
> > forcing the use of a 64-bit counter, because in that case:
> >
> > * For a CPU_CYCLES event:
> >   __armv8_pmuv3_map_event() will always pick 64-bits
> >   get_event_idx() may fail to allocate a 64-bit counter.
> >
> > * For other events:
> >   __armv8_pmuv3_map_event() will pick 32/64 based on long counter
> >   support
> >   get_event_idx() will only fail if there are no counters free.
> >
> > Whereas if __armv8_pmuv3_map_event() returned an error for the latter
> > when long counter support is not implemented, we'd have consistent
> > `long` semantics, and the CPU_CYCLES behaviour would be identical.
> >
> > What's the rationale for `3` leaving the choice to the kernel?
> 
> It's the give me the maximum sized counter the h/w can support choice.
> That's easier for userspace to implement. Bit 1 is more of a hint that
> the user wants userspace access rather than a requirement.
> 
> > If the problem is discoverability, I'd be happy to add something to
> > sysfs to describe whether the PMU has long event support.
> 
> Checking sysfs or a try for 64-bit support then fall back to 32-bit
> support isn't much difference.
> 
> Keep in mind that x86 always succeeds here. Every userspace user will
> have to add whatever dance we create here. For example, each libperf
> test with user access (there's only 2 in my tree, but there's a series
> adding more) has to have an '#ifdef __aarch64__' for whatever we do
> here. I was seeking to minimize that. Right now, that's just a set
> config1 to 0x3. Also, note that libperf will opportunistically use a
> userspace read instead of read(). The user just has to mmap the event
> and libperf will use a userspace read when enabled which ultimately
> depends on what the mmapped page says.

I think that x86 always succeeding here is more of a legacy thing that
they're stuck with rather than a design to be copied.

I'd prefer to keep the existing meaning of the `long` flag to mean "give
me 64 bits of counter, somehow", with `rdpmc` meaning "give me a single
counter I can access from userspace", even if that means the combination
of the two can sometimes be rejected. As you say, we can probe for that
as necessary by trying `long` then falling back to a plain event, and if
that ends up being a bottleneck somehow we can figure out a way of
advertising support to userspace. Regardless, we should 

Importantly, I don't think libperf should override a user's request for
`long`, since the user may want to optimize for minimal perturbation
rather than faster access.

If we want a "please give me the longest counter that's compatible with
other constraints", I think that should be a new flag e.g. `trylong`,
and shouldn't override the existing `long`. We can add that as a
follow-up if we want it.

Thanks,
Mark.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ