lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 7 Apr 2020 07:06:59 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Keno Fischer <keno@...iacomputing.com>
Cc:     linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andi Kleen <andi@...stfloor.org>,
        Kyle Huey <khuey@...ehuey.com>,
        Robert O'Callahan <robert@...llahan.org>
Subject: Re: [RFC PATCH v2] x86/arch_prctl: Add ARCH_SET_XCR0 to set XCR0
 per-thread

On 4/7/20 5:21 AM, Peter Zijlstra wrote:
> You had a fairly long changelog detailing what the patchd does; but I've
> failed to find a single word on _WHY_ we want to do any of that.

The goal in these record/replay systems is to be able to recreate thee
exact same program state on two systems at two different times.  To make
it reasonably fast, they try to minimize the number of snapshots they
have to take and avoid things like single stepping.

So, there are some windows where they just let the CPU run and don't
bother with taking any snapshots of register state, for instance.  Let's
say you read a word from shared memory, multiply it and shift it around
some registers, then stick it back in shared memory.  Most of these
things will just a record the snapshot at the memory read and assume
that all the instructions in the middle execute deterministically.  That
eliminates a ton of snapshots.

But, what if an instruction in the middle isn't deterministic between
two machines.  Let's say you record a trace on a a Broadwell system,
then try to replay it on a Skylake, and one of the non-snapshotted
instructions is xgetbv.  Skylake added MPX, so xgetbv will return
different values.  Your replay diverges from what was "recorded", and
life sucks.

Same problem exists for CPUID, but that was hacked around in another set.

I'm also trying to think of what kinds of things CPU companies add to
their architectures that would break this stuff.  I can't recall ever
having a discussion with folks at Intel where we're designing a CPU
feature and we say, "Can't do that, it would break record/replay".  I
suspect there are more of these landmines around and I bet that we're
building more of them into CPUs every day.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ