linux-kernel - Re: [RFC][PATCH 1/2] x86: Allow breakpoints to emulate call functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190506201014.484e7b65@oasis.local.home>
Date:   Mon, 6 May 2019 20:10:14 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...capital.net>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Nicolai Stange <nstange@...e.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Jiri Kosina <jikos@...nel.org>,
        Miroslav Benes <mbenes@...e.cz>,
        Petr Mladek <pmladek@...e.com>,
        Joe Lawrence <joe.lawrence@...hat.com>,
        Shuah Khan <shuah@...nel.org>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Mimi Zohar <zohar@...ux.ibm.com>,
        Juergen Gross <jgross@...e.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Nayna Jain <nayna@...ux.ibm.com>,
        Masahiro Yamada <yamada.masahiro@...ionext.com>,
        Joerg Roedel <jroedel@...e.de>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>, stable <stable@...r.kernel.org>,
        Masami Hiramatsu <mhiramat@...nel.org>
Subject: Re: [RFC][PATCH 1/2] x86: Allow breakpoints to emulate call
 functions

On Mon, 6 May 2019 15:31:57 -0700
Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Mon, May 6, 2019 at 3:06 PM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > Why are you emulating something different than what you are rewriting?  
> 
> Side note: I'm also finding another bug on the ftrace side, which is a
> simple race condition.
> 
> In particular, the logic with 'modifying_ftrace_code' is fundamentally racy.
> 
> What can happen is that on one CPU we rewrite one instruction:
> 
>         ftrace_update_func = ip;
>         /* Make sure the breakpoints see the ftrace_update_func update */
>         smp_wmb();
> 
>         /* See comment above by declaration of modifying_ftrace_code */
>         atomic_inc(&modifying_ftrace_code);
> 
>         ret = ftrace_modify_code(ip, old, new);
> 
>         atomic_dec(&modifying_ftrace_code);
> 
>    but then another CPU hits the 'int3' while the modification is
> going on, and takes the fault.
> 
> The fault handler does that
> 
>         if (unlikely(atomic_read(&modifying_ftrace_code))..
> 
> and sees that "yes, it's in the middle of modifying the ftrace code",
> and calls ftrace_int3_handler().  All good and "obviously correct" so
> far, no?
> 
> HOWEVER. It's actually buggy. Because in the meantime, the CPU that
> was rewriting instructions has finished, and decrements the
> modifying_ftrace_code, which doesn't hurt us (because we already saw
> that the int3 was due to the modification.

But the CPU that was rewriting instructions does a run_sync() after
removing the int3:

static void run_sync(void)
{
	int enable_irqs;

	/* No need to sync if there's only one CPU */
	if (num_online_cpus() == 1)
		return;

	enable_irqs = irqs_disabled();

	/* We may be called with interrupts disabled (on bootup). */
	if (enable_irqs)
		local_irq_enable();
	on_each_cpu(do_sync_core, NULL, 1);
	if (enable_irqs)
		local_irq_disable();
}

Which sends an IPI to all CPUs to make sure they no longer see the int3.

> 
> BUT! There are two different races here:
> 
>  (a) maybe the fault handling was slow, and we saw the 'int3' and took
> the fault, but the modifying CPU had already finished, so that
> atomic_read(&modifying_ftrace_code) didn't actually trigger at all.
> 
>  (b) maybe the int3-faulting CPU *did* see the proper value of
> modifying_ftrace_code, but the modifying CPU went on and started
> *another* modification, and has changed ftrace_update_func in the
> meantime, so now the int3 handling is looking at the wrong values!
> 
> In the case of (a), we'll die with an oops due to the inexplicable
> 'int3' we hit. And in the case of (b) we'll be fixing up using the
> wrong address.
> 
> Things like this is why I'm wondering how much of the problems are due
> to the entry code, and how much of it is due to simply races and
> timing differences?
> 
> Again, I don't actually know the ftrace code, and maybe I'm missing
> something, but this really looks like _another_ fundamental bug.
> 
> The way to handle that modifying_ftrace_code thing is most likely by
> using a sequence counter. For example, one way to actually do some
> thing like this might be
> 
>         ftrace_update_func = ip;
>         ftrace_update_target = func;
>         smp_wmb();
>         atomic_inc(&modifying_ftrace_head);
> 
>         ret = ftrace_modify_code(ip, old, new);
> 
>         atomic_inc(&modifying_ftrace_tail);
>         smp_wmb();
> 
> and now the int3 code could do something like
> 
>         int head, tail;
> 
>         head = atomic_read(&modifying_ftrace_head);
>         smp_rmb();
>         tail = atomic_read(&modifying_ftrace_tail);
> 
>         /* Are we still in the process of modification? */
>         if (unlikely(head != tail+1))
>                 return 0;
> 
>         ip = ftrace_update_func;
>         func = ftrace_update_target;
>         smp_rmb();
>         /* Need to re-check that the above two values are consistent
> and we didn't exit */
>         if (atomic_read(&modifying_ftrace_tail) != tail)
>                 return 0;
> 
>         *pregs int3_emulate_call(regs, ip, func);
>         return 1;
> 
> although it probably really would be better to use a seqcount instead
> of writing it out like the above.
> 
> NOTE! The above only fixes the (b) race. The (a) race is probably best
> handled by actually checking if the 'int3' instruction is still there
> before dying.
> 
> Again, maybe there's something I'm missing, but having looked at that
> patch now what feels like a million times, I'm finding more worrisome
> things in the ftrace code than in the kernel entry code..

I think you are missing the run_sync() which is the heavy hammer to
make sure all CPUs are in sync. And this is done at each stage:

	add int3
	run_sync();
	update call cite outside of int3
	run_sync()
	remove int3
	run_sync()

HPA said that the last run_sync() isn't needed, but I kept it because I
wanted to make sure. Looks like your analysis shows that it is needed.

-- Steve