[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNComKRDdZJ8SJECNdoAzQhmR3vu9yKAtp7NKDmECxff=fg@mail.gmail.com>
Date: Fri, 17 Mar 2023 23:02:50 -0700
From: John Stultz <jstultz@...gle.com>
To: "Liang, Kan" <kan.liang@...ux.intel.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, peterz@...radead.org,
mingo@...hat.com, linux-kernel@...r.kernel.org, sboyd@...nel.org,
eranian@...gle.com, namhyung@...nel.org, ak@...ux.intel.com,
adrian.hunter@...el.com
Subject: Re: [RFC PATCH V2 2/9] perf: Extend ABI to support post-processing
monotonic raw conversion
On Mon, Mar 13, 2023 at 2:19 PM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>
>
>
> On 2023-03-11 12:55 a.m., John Stultz wrote:
> > On Thu, Mar 9, 2023 at 8:56 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
> >> On 2023-03-08 8:17 p.m., John Stultz wrote:
> >>> So I spent a little bit of time today adding some trace_printks to the
> >>> timekeeping code so I could record the actual TSC and timestamps being
> >>> calculated from CLOCK_MONOTONIC_RAW.
> >>>
> >>> I did catch one error in the test code, which unfortunately I'm to blame for:
> >>> mid = start + (delta +(delta/2))/2; //round-closest
> >>>
> >>> That should be
> >>> mid = start + (delta +(2/2))/2 //round-closest
> >>> or more simply
> >>> mid = start + (delta +1)/2; //round-closest
> >>>
> >>> Generalized rounding should be: (value + (DIV/2))/DIV), but I'm
> >>> guessing with two as the divisor, my brain mixed it up and typed
> >>> "delta". My apologies!
> >>>
> >>> With that fix, I'm seeing closer to ~500ns of error in the
> >>> interpolation, just using the userland sampling. Now, I've also
> >>> disabled vsyscalls for this (otherwise I wouldn't be able to
> >>> trace_printk), so the error likely would be higher than with
> >>> vsyscalls.
> >>>
> >>> Now, part of the error is that:
> >>> start= rdtsc();
> >>> clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
> >>> end = rdtsc();
> >>>
> >>> Ends up looking like
> >>> start= rdtsc();
> >>> clock_gettime() {
> >>> now = rdtsc();
> >>> delta = now - last;
> >>> ns = (delta * mult) >> shift
> >>> [~midpoint~]
> >>> ts->nsec = base_ns + ns;
> >>> ts->sec = base_sec;
> >>> normalize_ts(ts)
> >>> }
> >>> end = rdtsc();
> >>>
> >>> And so by taking the mid-point we're always a little skewed from where
> >>> the tsc was actually read. Looking at the data for my case the tsc
> >>> read seems to be ~12% in, so you could instead try:
> >>>
> >>> delta = end - start;
> >>> p12 = start + ((delta * 12) + (100/2))/100;
> >>>
> >>> With that adjustment, I'm seeing error around ~40ns.
> >>>
> >>> Mind giving that a try?
> >>
> >> I tried both the new mid and p12. The error becomes even larger.
> >>
> >> With new mid (start + (delta +1)/2), the error is now ~3800ns
> >> With p12 adjustment, the error is ~6700ns.
> >>
> >>
> >> Here is how I run the test.
> >> $./time
> >> $perf record -e cycles:upp --clockid monotonic_raw $some_workaround
> >> $./time
> >>
> >> Here are some raw data.
> >>
> >> For the first ./time,
> >> start: 961886196018
> >> end: 961886215603
> >> MONO_RAW: 341485848531
> >>
> >> For the second ./time,
> >> start: 986870117783
> >> end: 986870136152
> >> MONO_RAW: 351495432044
> >>
> >> Here is the time generated from one PEBS record.
> >> TSC: 968210217271
> >> PEBS_MONO_RAW (calculated via kernel conversion information): 344019503072
> >>
> >> Using new mid (start + (delta +1)/2), the guessed PEBS_MONO_RAW is
> >> 344019506897. The error is 3825ns.
> >> Using p12 adjustment, the guessed PEBS_MONO_RAW is 344019509831.
> >> The error is 6759ns
> >
> > Huh. I dunno. That seems wild that the error increased.
> >
> > Just in case something is going astray with the PEBS_MONO_RAW logic,
> > can you apply the hack patch I was using to display the MONOTONIC_RAW
> > values the kernel calculates?
> > https://github.com/johnstultz-work/linux-dev/commit/8d7896b078965b059ea5e8cc21841580557f6df6
> >
> > It uses trace_printk, so you'll have to cat /sys/kernel/tracing/trace
> > to get the output.
> >
>
>
> $ ./time_3
> start: 7358368893806 end: 7358368902944 delta: 9138
> MONO_RAW: 2899739790738
> MID: 7358368898375
> P12: 7358368894903
> $ sudo cat /sys/kernel/tracing/trace | grep time_3
> time_3-1443 [002] ..... 2899.858936: ktime_get_raw_ts64:
> JDB: timekeeping_get_delta cycle_now: 7358368897679
> time_3-1443 [002] ..... 2899.858937: ktime_get_raw_ts64:
> JDB: ktime_get_raw_ts64: 2899739790738
>
> The error between MID and cycle_now is -696ns
> The error between P12 and cycle_now is 2776ns
Hey Kan,
So I'm terribly sorry, I'm a bit underwater right now and haven't
had time to look deeper at this. The MID case you have above looks
closer to what I was seeing but I can't explain why the 12% case is
worse.
Since I feel it's not really fair to object to your patch but not have
the time to work through an alternative with you, I'm going to
withdraw my objection (though others may persist!).
I'd still really prefer if we avoided exposing internal timekeeping
state directly to userland, and it would be good to see some further
exploration in other directions, but there is the existing perf mmap
precedence (even if I dislike it). Sorry I can't be of more help to
find a better approach here. :(
thanks
-john
Powered by blists - more mailing lists