linux-kernel - Re: [RFC PATCH v2] x86/arch_prctl: Add ARCH_SET

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a5b07aa9-96ea-a9b5-13db-e5dcbd7760e6@intel.com>
Date:   Tue, 7 Apr 2020 07:06:59 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Keno Fischer <keno@...iacomputing.com>
Cc:     linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andi Kleen <andi@...stfloor.org>,
        Kyle Huey <khuey@...ehuey.com>,
        Robert O'Callahan <robert@...llahan.org>
Subject: Re: [RFC PATCH v2] x86/arch_prctl: Add ARCH_SET_XCR0 to set XCR0
 per-thread

On 4/7/20 5:21 AM, Peter Zijlstra wrote:
> You had a fairly long changelog detailing what the patchd does; but I've
> failed to find a single word on _WHY_ we want to do any of that.

The goal in these record/replay systems is to be able to recreate thee
exact same program state on two systems at two different times.  To make
it reasonably fast, they try to minimize the number of snapshots they
have to take and avoid things like single stepping.

So, there are some windows where they just let the CPU run and don't
bother with taking any snapshots of register state, for instance.  Let's
say you read a word from shared memory, multiply it and shift it around
some registers, then stick it back in shared memory.  Most of these
things will just a record the snapshot at the memory read and assume
that all the instructions in the middle execute deterministically.  That
eliminates a ton of snapshots.

But, what if an instruction in the middle isn't deterministic between
two machines.  Let's say you record a trace on a a Broadwell system,
then try to replay it on a Skylake, and one of the non-snapshotted
instructions is xgetbv.  Skylake added MPX, so xgetbv will return
different values.  Your replay diverges from what was "recorded", and
life sucks.

Same problem exists for CPUID, but that was hacked around in another set.

I'm also trying to think of what kinds of things CPU companies add to
their architectures that would break this stuff.  I can't recall ever
having a discussion with folks at Intel where we're designing a CPU
feature and we say, "Can't do that, it would break record/replay".  I
suspect there are more of these landmines around and I bet that we're
building more of them into CPUs every day.