linux-kernel - Re: [PATCH 1/4] jump label - make init_kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091007023251.GA4664@Krystal>
Date:	Tue, 6 Oct 2009 22:32:52 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Ingo Molnar <mingo@...e.hu>, Jason Baron <jbaron@...hat.com>,
	linux-kernel@...r.kernel.org, tglx@...utronix.de, ak@...e.de,
	roland@...hat.com, rth@...hat.com, mhiramat@...hat.com
Subject: Re: [PATCH 1/4] jump label - make init_kernel_text() global

* Steven Rostedt (rostedt@...dmis.org) wrote:
> On Sat, 2009-10-03 at 08:39 -0400, Mathieu Desnoyers wrote:
> 
> > I might be missing a bit of context here, I just want to make sure we
> > are on the same page: patching a jmp instruction is safe on UP, safe
> > with stop_machine(), is very likely safe with the breakpoint-ipi
> 
> Hi Mathieu,
> 
> I've been reading through these threads (both this one and the immediate
> one) and I'm still a bit confused. I really want to understand this in a
> simple way, thus make sure everyone else understands it too.
> 
> >From what Arjan said here:
> 
>   http://lkml.org/lkml/2009/9/25/98
> 
> The issue is going back from the int3 to the old value. How does the
> breakpoint-ipi work?
> 
> Supposedly, we can add an int3 to the code without any worry. If another
> CPU at that same time hits that code path, it will either run the old
> code, or take the interrupt. The breakpoint interrupt handler, will
> handle that code path, and the execution continues.
> 
> Now what is the issues with removing the int3 and placing back the old
> (or new) value. Is there an issue if another CPU is about to execute
> that code path as we remove the int3? If so, how does sending an IPI
> help the matter without adding more races?
> 
> Is there only an issue if we change the old value with something else,
> and you just need to send the IPI after you modify the old code and
> before removing the int3?
> 
> I may just be totally confused, which I usually am. But when I'm not
> confused, I feel that the code is practical ;-)
> 

Hi Steven,

OK, I'll make the explanation as straightforward as possible. I'll use a
race example to illustrate what we try to avoid by using the
breakpoint+ipi scheme. After that, I present the same scenario with the
breakpoint+ipi in place.

Each step shows what is executed, and what is the memory values seen by
the CPU. CPU A is doing the code patching, CPU B executing the code.
I intentionally left out some sfence required on CPU A for simplicity.)

Initially, let's say we have:
(1)  (2)
0xeb 0xe5    (jmp to offset 0xe5)

And we want to change this to:
(1)  (2)
0xeb 0xf0    (jmp to offset 0xf0)

(scenario "buggy")

CPU A       |       CPU B  (this is about as far as my ascii-art skills go)
-------------------------    ;)
0xeb 0xe5     0xeb 0xe5
0:            CPU B instruction pointer is earlier than (1)
              CPU B pipeline speculatively predicts branches,
              prefetches data, calculates speculated values.
1:            CPU B loads 0xeb
2:            CPU B loads 0xe5
3:
Write to (2)
0xeb 0xf0     0xeb 0xf0
4:            CPU B instruction pointer gets to (1), needs to validate
              all the pipeline speculation.
              But ! The CPU does not expect code to change underneath.
              General protection fault (or any other fault.. random..)


Now with the breakpoint+ipi/mb() scheme:
(scenario A: CPU B does not hit the breakpoint)

CPU A       |       CPU B
-------------------------
0xeb 0xe5     0xeb 0xe5
0:            CPU B instruction pointer is earlier than (1)
              CPU B pipeline speculatively predicts branches,
              prefetches data, calculates speculated values.
1:            CPU B loads 0xeb
2:            CPU B loads 0xe5
3:
Write to (1)
0xcc 0xe5     0xcc 0xe5  # breakpoint inserted
4: send IPI
5:            mfence     # serializing instruction. Flushes CPU B's
                         # pipeline
6:
Write to (2)
0xcc 0xf0     0xcc 0xf0
7:
Write to (1)
0xeb 0xf0     0xeb 0xf0
8:            CPU B instruction pointer gets to (1), needs to validate
              all the pipeline speculation. Because we flushed any
              speculation prior to the mfence, we're ok.


Now, I'll show why just using the breakpoint, without IPI, is
problematic:

CPU A       |       CPU B
-------------------------
0xeb 0xe5     0xeb 0xe5
0:            CPU B instruction pointer is earlier than (1)
              CPU B pipeline speculatively predicts branches,
              prefetches data, calculates speculated values.
1:            CPU B loads 0xeb
2:            CPU B loads 0xe5
3:
Write to (1)
0xcc 0xe5     0xcc 0xf0  # breakpoint inserted
4:
Write to (2)
0xcc 0xf0     0xeb 0xf0  # Silly CPU B. Did not see nor use the breakpoint.
                         # Same problem as scenario "buggy".
5:
Write to (1)
0xeb 0xf0     0xeb 0xf0
4:            CPU B instruction pointer gets to (1), needs to validate
              all the pipeline speculation.
              But ! The CPU does not expect code to change underneath.
              General protection fault (or any other fault.. random..)

So, basically, we ensure that the only transitions CPU B will see are
either:

0xeb 0xe5 -> 0xcc 0xe5 : OK, adding breakpoint
0xcc 0xe5 -> 0xcc 0xf0 : OK, not using the operand anyway, it's a
                             breakpoint!
0xcc 0xf0 -> 0xeb 0xf0 : OK, removing breakpoint

*but*, the transition we guarantee that CPU B will *never* see without
having a mfence executed between the old and the new version is:

0xeb 0xe5 -> 0xeb 0xf0  <----- buggy.

Hope the explanation helps,

Mathieu


> -- Steve
> 
>  
> 
> > approach (but we need the confirmation from Intel, which hpa is trying
> > to get), but is definitely _not_ safe if neither of these methods are
> > used on a SMP system. If a non-aligned multi-word jump is modified while
> > another CPU is fetching the instruction, bad things could happen.
> 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/