linux-kernel - Re: Official documentation from Intel stating that poking INT3 (single-byte) concurrently is OK ?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d593fccb-aace-6611-c9c6-46049e2de817@efficios.com>
Date:   Wed, 22 Feb 2023 11:41:10 -0500
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        "H . Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Olivier Dion <odion@...icios.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Jiri Kosina <jkosina@...e.cz>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: Official documentation from Intel stating that poking INT3
 (single-byte) concurrently is OK ?

On 2023-02-22 04:20, Peter Zijlstra wrote:
> On Tue, Feb 21, 2023 at 01:42:58PM -0500, Mathieu Desnoyers wrote:
>> On 2023-02-21 12:50, Steven Rostedt wrote:
>>> On Tue, 21 Feb 2023 11:44:42 -0500
>>> Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
>>>
>>>> Hi Peter,
>>>>
>>>> I have emails from you dating from a few years back unofficially stating
>>>> that it's OK to update the first byte of an instruction with a single-byte
>>>> int3 concurrently:
>>>>
>>>> https://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
>>>>
>>>> It is referred in the original implementation of text_poke_bp():
>>>> commit fd4363fff3d9 ("x86: Introduce int3 (breakpoint)-based instruction patching")
>>>>
>>>> Olivier Dion is working on the libpatch [1,2] project aiming to use this
>>>> property for low-latency/low-overhead live code patching in user-space as
>>>> well, but we cannot find an official statement from Intel that guarantees
>>>> this breakpoint-bypass technique is indeed OK without stopping the world
>>>> while patching.
>>>>
>>>> Do you know where I could find an official statement of this guarantee ?
>>>>
>>>
>>> The fact that we have been using it for over 10 years without issue should
>>> be a good guarantee ;-)
>>>
>>> I know you probably prefer an official statement, and I thought they
>>> actually gave one, but can't seem to find it.
>>
>> I recall an in-person discussion with Peter Anvin shortly after he got the
>> official confirmation, but I cannot find any public trace of it. I suspect
>> Intel may have documented this internally only.
> 
> My 2ct, ISTR this also having been vetted by AMD, perhaps they did
> manage to write it down somewhere.

Good point! I did not find a statement specifically about the breakpoint 
bypass, but by piecing up together the explanations from their manual, I 
think we can conclude that it is safe:

Based on AMD64 Architecture Programmer’s Manual Volume 2
7.6.1 Cache Organization and Operation
Cross-Modifying Code

The subsection "Asynchronous modification" describes in detail what 
happens if we concurrently update an instruction that is concurrently 
executed. The good news is that there is no mention of an evil Boeman 
triggering any kind of general protection fault when updating 
instructions concurrently with their execution. So inserting a 
single-byte breakpoint as first byte of an instruction is just the 
simplest scenario covered by that section:

"Such modifications must be done via a single store to the target 
thread's instruction stream that is contained entirely within a 
naturally-aligned quadword, and is subject to the constraints given 
here. A key aspect is that, although the store is performed atomically, 
the affected quadword may be read more than once in the process of 
extracting instruction bytes from it. This can result in the following 
scenarios resulting from a single store:

[...]

2. A modification to one instruction A that changes it to two 
instructions A'-B will only result in execution of A'-B.

[...]"

Then there is the "Synchronous modification" section which basically 
describes how serializing instructions can be issued before proceeding 
to execute the modified instructions.

So AFAIU the XMC breakpoint insertion without stopping the world is 
covered by AMD's "Asynchronous modification" section, and the rest of 
the breakpoint-bypass technique using serializing instructions relying 
on IPIs in the kernel, and on membarrier sync-core in userspace, is 
guaranteed by the "Synchronous modification" section.

Unfortunately I cannot find anything with respect to asynchronous 
cross-modification of code stated as clearly in Intel's documentation.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com