lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <70ec475d45644c18b782d8bbe6a4e921@AcuMS.aculab.com>
Date:   Mon, 25 Jul 2022 07:35:55 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     "'Luck, Tony'" <tony.luck@...el.com>
CC:     "Sun, Yi" <yi.sun@...el.com>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "Mehta, Sohil" <sohil.mehta@...el.com>,
        "Su, Heng" <heng.su@...el.com>
Subject: RE: [PATCH 1/2] x86/fpu: Measure the Latency of XSAVE and XRSTOR

From: Luck, Tony
> Sent: 25 July 2022 00:00
> 
> RDTSC has returned values invariant of current frequency since Nehalem (modulo a few hiccoughs). So
> any CPU with XSAVE/XRESTOR should be safe to measure using TSC.

Indeed - that it exactly why you can't use the TSC to measure
instruction latency any more.
You need to measure latency in clocks, not time.

On cpu where all the cores run at the same frequency you can
see the effect by spinning one core in userspace.
Running 'while :; do :; done' from a shell prompt is pretty
effective at spinning in userspace.

	David

> 
> Sent from my iPhone
> 
> > On Jul 24, 2022, at 14:16, David Laight <David.Laight@...lab.com> wrote:
> >
> > From: Yi Sun
> >> Sent: 23 July 2022 09:38
> >>
> >> Calculate the latency of instructions xsave and xrstor with new trace
> >> points x86_fpu_latency_xsave and x86_fpu_latency_xrstor.
> >>
> >> The delta TSC can be calculated within a single trace event. Another
> >> option considered was to have 2 separated trace events marking the
> >> start and finish of the xsave/xrstor instructions. The delta TSC was
> >> calculated from the 2 trace points in user space, but there was
> >> significant overhead added by the trace function itself.
> >>
> >> In internal testing, the single trace point option which is
> >> implemented here proved to be more accurate.
> > ...
> >
> > I've done some experiments that measure short instruction latencies.
> > Basically I found:
> > 1) You need a suitable serialising instruction before and after
> >   the code being tested - otherwise it can overlap whatever
> >   you are using for timing.
> > 2) The only reliable counter is the performance monitor clock
> >   counter - everything else depends on the current cpu frequency.
> >   On intel cpu the cpu frequency can change all the time.
> > Allowing for that, and then ignoring complete outliers, I could
> > get clock-count accurate values for iterations of the IP csum loop.
> >
> >    David
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> >

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ