linux-kernel - Re: [PATCH 3/3] perf/x86/intel/ds: Support monotonic clock for PEBS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCqMaqg1S4Vt_6Pe6M-9seGwA8Hxb8vR5KnLaByvG1JANg@mail.gmail.com>
Date:   Mon, 23 Jan 2023 22:56:12 -0800
From:   John Stultz <jstultz@...gle.com>
To:     kan.liang@...ux.intel.com
Cc:     peterz@...radead.org, mingo@...hat.com, tglx@...utronix.de,
        sboyd@...nel.org, linux-kernel@...r.kernel.org, eranian@...gle.com,
        namhyung@...nel.org, ak@...ux.intel.com
Subject: Re: [PATCH 3/3] perf/x86/intel/ds: Support monotonic clock for PEBS

On Mon, Jan 23, 2023 at 10:27 AM <kan.liang@...ux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@...ux.intel.com>
>
> Users try to reconcile user samples with PEBS samples and require a
> common clock source. However, the current PEBS codes only convert to
> sched_clock, which is not available from the user space.
>
> Only support converting to clock monotonic. Having one common clock
> source is good enough to fulfill the requirement.
>
> Enable the large PEBS for the monotonic clock to reduce the PEBS
> overhead.
>
> There are a few rare cases that may make the conversion fails. For
> example, TSC overflows. The cycle_last may be changed between samples.
> The time will fallback to the inaccurate SW times. But the cases are
> extremely unlikely to happen.
>
> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
> ---

Thanks for sending this out!
A few minor style issues below and a warning.

> The patch has to be on top of the below patch
> https://lore.kernel.org/all/20230123172027.125385-1-kan.liang@linux.intel.com/
>
>  arch/x86/events/intel/core.c |  2 +-
>  arch/x86/events/intel/ds.c   | 30 ++++++++++++++++++++++++++----
>  2 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 14f0a746257d..ea194556cc73 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3777,7 +3777,7 @@ static unsigned long intel_pmu_large_pebs_flags(struct perf_event *event)
>  {
>         unsigned long flags = x86_pmu.large_pebs_flags;
>
> -       if (event->attr.use_clockid)
> +       if (event->attr.use_clockid && (event->attr.clockid != CLOCK_MONOTONIC))
>                 flags &= ~PERF_SAMPLE_TIME;
>         if (!event->attr.exclude_kernel)
>                 flags &= ~PERF_SAMPLE_REGS_USER;
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 7980e92dec64..d7f0eaf4405c 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1570,13 +1570,33 @@ static u64 get_data_src(struct perf_event *event, u64 aux)
>         return val;
>  }
>
> +static int pebs_get_synctime(struct system_counterval_t *system,
> +                            void *ctx)

Just because the abstract function type taken by
get_mono_fast_from_given_time is vague, doesn't mean the
implementation needs to be.
ctx is really a tsc value, right? So let's call it that to make this a
bit more readable.

> +{
> +       *system = set_tsc_system_counterval(*(u64 *)ctx);
> +       return 0;
> +}
> +
> +static inline int pebs_clockid_time(clockid_t clk_id, u64 tsc, u64 *clk_id_time)

clk_id_time is maybe a bit too fuzzy. It is really a mono_ns value,
right? Let's keep that explicit here.

> +{
> +       /* Only support converting to clock monotonic */
> +       if (clk_id != CLOCK_MONOTONIC)
> +               return -EINVAL;
> +
> +       return get_mono_fast_from_given_time(pebs_get_synctime, &tsc, clk_id_time);
> +}
> +
>  static void setup_pebs_time(struct perf_event *event,
>                             struct perf_sample_data *data,
>                             u64 tsc)
>  {
> -       /* Converting to a user-defined clock is not supported yet. */
> -       if (event->attr.use_clockid != 0)
> -               return;
> +       u64 time;

Again, "time" is too generic a term without any context here.
mono_nsec or something would be more clear.

> +
> +       if (event->attr.use_clockid != 0) {
> +               if (pebs_clockid_time(event->attr.clockid, tsc, &time))
> +                       return;
> +               goto done;
> +       }

Apologies for this warning/rant:

So, I do get the NMI safety of the "fast" time accessors (along with
the "high performance" sounding name!) is attractive, but as its use
expands I worry the downsides of this interface isn't made clear
enough.

The fast accessors *can* see time discontinuities! Because the logic
is done without holding the tk_core.seq lock, If you are reading in
the middle of a ntp adjustment, you may find the current value to be
larger than the next time you read the time.  These discontinuities
are likely to be very small, but a negative delta will look very large
as a u64.  So part of using these "fast *and unsafe*" interfaces is
you get to keep both pieces when it breaks. Make sure the code here
that is using these interfaces guards against this (zeroing out
negative deltas).

thanks
-john