linux-kernel - Re: [RFC PATCH V2 2/9] perf: Extend ABI to support post-processing monotonic raw conversion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCpnoRjrX7k7spDdnUgwzygVUoaF5u7A2-heQfCuokcN6Q@mail.gmail.com>
Date:   Wed, 8 Mar 2023 17:17:57 -0800
From:   John Stultz <jstultz@...gle.com>
To:     "Liang, Kan" <kan.liang@...ux.intel.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>, peterz@...radead.org,
        mingo@...hat.com, linux-kernel@...r.kernel.org, sboyd@...nel.org,
        eranian@...gle.com, namhyung@...nel.org, ak@...ux.intel.com,
        adrian.hunter@...el.com
Subject: Re: [RFC PATCH V2 2/9] perf: Extend ABI to support post-processing
 monotonic raw conversion

On Wed, Mar 8, 2023 at 10:44 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
> On 2023-02-17 6:11 p.m., John Stultz wrote:
> > On Tue, Feb 14, 2023 at 12:38 PM Liang, Kan <kan.liang@...ux.intel.com> wrote:
> >> On 2023-02-14 3:11 p.m., John Stultz wrote:
> >>> On Tue, Feb 14, 2023 at 9:00 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
> >>>> On 2023-02-14 9:51 a.m., Liang, Kan wrote:
> >>>>> If I understand correctly, the idea is to let the user space tool run
> >>>>> the above interpoloation algorithm several times to 'guess' the atomic
> >>>>> mapping. Using the mapping information to covert the TSC from the PEBS
> >>>>> record. Is my understanding correct?
> >>>>>
> >>>>> If so, to be honest, I doubt we can get the accuracy we want.
> >>>>>
> >>>>
> >>>> I implemented a simple test to evaluate the error.
> >>>
> >>> Very cool!
> >>>
> >>>> I collected TSC -> CLOCK_MONOTONIC_RAW mapping using the above algorithm
> >>>> at the start and end of perf cmd.
> >>>>         MONO_RAW        TSC
> >>>> start   89553516545645  223619715214239
> >>>> end     89562251233830  223641517000376
> >>>>
> >>>> Here is what I get via mult/shift conversion from this patch.
> >>>>         MONO_RAW        TSC
> >>>> PEBS    89555942691466  223625770878571
> >>>>
> >>>> Then I use the time information from start and end to create a linear
> >>>> function and 'guess' the MONO_RAW of PEBS from the TSC. I get
> >>>> 89555942692721.
> >>>> There is a 1255 ns difference.
> >>>> I tried several different PEBS records. The error is ~1000ns.
> >>>> I think it should be an observable error.
> >>>
> >>> Interesting. That's a good bit higher than I'd expect as I'd expect a
> >>> clock_gettime() call to take ~ double digit nanoseconds range on
> >>> average, so the error should be within that.
> >>>
> >>> Can you share your logic?
> >>>
> >>
> >> I run the algorithm right before and after the perf command as below.
> >> (The source code of time is attached.)
> >>
> >> $./time
> >> $perf record -e cycles:upp --clockid monotonic_raw $some_workaround
> >> $./time
> >>
> >> The time will dump both MONO_RAW and TSC. That's where "start" and "end"
> >> from.
> >> The perf command print out both TSC and converted MONO_RAW (using the
> >> mul/shift from this patch series). That's where "PEBS" value from.
> >>
> >> Than I use the below formula to calculate the guessed MONO_RAW of PEBS TSC.
> >> Guessed_MONO_RAW = (PEBS_TSC - start_TSC) / (end_TSC - start_TSC) *
> >> (end_MONO_RAW - start_MONO_RAW) + start_MONO_RAW.
> >>
> >> The guessed_MONO_RAW is 89555942692721.
> >> The PEBS_MONO_RAW is 89555942691466.
> >> The difference is 1255.
> >>
> >> Is the calculation correct?
> >
> > Thanks for sharing it. The equation you have there looks ok at a high
> > level for the values you captured (there's small tweaks like doing the
> > mult before the div to make sure you don't hit integer precision
> > issues, but I didn't see that with your results).
> >
> > I've got a todo to try to see how the calculation changes if we do
> > provide atomic TSC/RAW stamps, here but I got a little busy with other
> > work and haven't gotten to it.
> > So my apologies, but I'll try to get back to this soon.
> >
>
> Have you got a chance to try the idea?
>
> I just want to check whether the userspace interpolation approach works.
> Should I prepare V3 and go back to the kernel solution?

Oh, my apologies. I had some other work come up and this fell off my plate.

So I spent a little bit of time today adding some trace_printks to the
timekeeping code so I could record the actual TSC and timestamps being
calculated from CLOCK_MONOTONIC_RAW.

I did catch one error in the test code, which unfortunately I'm to blame for:
  mid = start + (delta +(delta/2))/2; //round-closest

That should be
  mid = start + (delta +(2/2))/2  //round-closest
or more simply
  mid = start + (delta +1)/2; //round-closest

Generalized rounding should be: (value + (DIV/2))/DIV), but I'm
guessing with two as the divisor, my brain mixed it up and typed
"delta". My apologies!

With that fix, I'm seeing closer to ~500ns of error in the
interpolation, just using the userland sampling.   Now, I've also
disabled vsyscalls for this (otherwise I wouldn't be able to
trace_printk), so the error likely would be higher than with
vsyscalls.

Now, part of the error is that:
  start= rdtsc();
  clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
  end = rdtsc();

Ends up looking like
  start= rdtsc();
  clock_gettime() {
     now = rdtsc();
     delta = now - last;
     ns = (delta * mult) >> shift
[~midpoint~]
     ts->nsec = base_ns + ns;
     ts->sec = base_sec;
     normalize_ts(ts)
  }
  end = rdtsc();

And so by taking the mid-point we're always a little skewed from where
the tsc was actually read.  Looking at the data for my case the tsc
read seems to be ~12% in, so you could instead try:

delta = end - start;
p12 = start + ((delta * 12) + (100/2))/100;

With that adjustment, I'm seeing error around ~40ns.

Mind giving that a try?

Now, if you had two snapshots of MONOTONIC_RAW + the TSC value used to
calculate it(maybe the driver access this via a special internal
timekeeping interface), in my testing interpolating will give you
sub-ns error. So I think this is workable without exposing quite so
much to userland.

thanks
-john