[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2C85DF08-0AA0-45CF-BD97-1149EF00C8B4@vmware.com>
Date:   Thu, 16 May 2019 18:53:05 +0000
From:   Nadav Amit <namit@...are.com>
To:     Andy Lutomirski <luto@...capital.net>
CC:     Paul Turner <pjt@...gle.com>,
        the arch/x86 maintainers <x86@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Andy Lutomirsky <luto@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jann Horn <jannh@...gle.com>
Subject: Re: [RFC] x86: Speculative execution warnings
> On May 14, 2019, at 10:15 AM, Andy Lutomirski <luto@...capital.net> wrote:
> 
> 
> 
> On May 14, 2019, at 10:00 AM, Nadav Amit <namit@...are.com> wrote:
> 
>>> On May 14, 2019, at 1:00 AM, Paul Turner <pjt@...gle.com> wrote:
>>> 
>>> From: Nadav Amit <namit@...are.com>
>>> Date: Fri, May 10, 2019 at 7:45 PM
>>> To: <x86@...nel.org>
>>> Cc: Borislav Petkov, <linux-kernel@...r.kernel.org>, Nadav Amit, Andy
>>> Lutomirsky, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Jann Horn
>>> 
>>>> It may be useful to check in runtime whether certain assertions are
>>>> violated even during speculative execution. This can allow to avoid
>>>> adding unnecessary memory fences and at the same time check that no data
>>>> leak channels exist.
>>>> 
>>>> For example, adding such checks can show that allocating zeroed pages
>>>> can return speculatively non-zeroed pages (the first qword is not
>>>> zero).  [This might be a problem when the page-fault handler performs
>>>> software page-walk, for example.]
>>>> 
>>>> Introduce SPEC_WARN_ON(), which checks in runtime whether a certain
>>>> condition is violated during speculative execution. The condition should
>>>> be computed without branches, e.g., using bitwise operators. The check
>>>> will wait for the condition to be realized (i.e., not speculated), and
>>>> if the assertion is violated, a warning will be thrown.
>>>> 
>>>> Warnings can be provided in one of two modes: precise and imprecise.
>>>> Both mode are not perfect. The precise mode does not always make it easy
>>>> to understand which assertion was broken, but instead points to a point
>>>> in the execution somewhere around the point in which the assertion was
>>>> violated.  In addition, it prints a warning for each violation (unlike
>>>> WARN_ONCE() like behavior).
>>>> 
>>>> The imprecise mode, on the other hand, can sometimes throw the wrong
>>>> indication, specifically if the control flow has changed between the
>>>> speculative execution and the actual one. Note that it is not a
>>>> false-positive, it just means that the output would mislead the user to
>>>> think the wrong assertion was broken.
>>>> 
>>>> There are some more limitations. Since the mechanism requires an
>>>> indirect branch, it should not be used in production systems that are
>>>> susceptible for Spectre v2. The mechanism requires TSX and performance
>>>> counters that are only available in skylake+. There is a hidden
>>>> assumption that TSX is not used in the kernel for anything else, other
>>>> than this mechanism.
>>> 
>>> Nice trick!
>> 
>> “Illusion." [ ignore if you don’t know the reference ]
>> 
>>> Can you eliminate the indirect call by forcing an access fault to
>>> abort the transaction instead, e.g. "cmove 0, $1”?
>>> 
>>> (If this works, it may also allow support on older architectures as
>>> the RTM_RETIRED.ABORT* events go back further I believe?)
>> 
>> I don’t think it would work. The whole problem is that we need a counter
>> that is updated during execution and not retirement. I tried several
>> counters and could not find other appropriate ones.
>> 
>> The idea behind the implementation is to affect the control flow through
>> data dependency. I may be able to do something similar without an indirect
>> branch. I’ll take a page, put the XABORT on the page and make the page NX.
>> Then, a direct jump would go to this page. The conditional-mov would change
>> the PTE to X if the assertion is violated. There should be a page-walk even
>> if the CPU finds the entry in the TLB, since this entry is NX.
> 
> I think you only get a page walk if the TLB entry is not-present. I’d be a
> bit surprised if the CPU is willing to execute, even speculatively, from
> speculatively written data. Good luck!
I guess you are right (although I didn’t try). IIRC, Jann Horn once
explained to me that if CPUs used PTEs that were written speculatively, this
would have been a correctness issue, since the PTE needs to get to the TLB
before it is used.
I’ll try a different path (not concrete idea which), assuming there is an
interest.
Powered by blists - more mailing lists
 
