linux-kernel - Re: [PATCH v6 7/8] x86: patch all traced function calls using the int3-based framework

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1390396845.16203.58.camel@pathway.suse.cz>
Date:	Wed, 22 Jan 2014 14:20:45 +0100
From:	Petr Mladek <pmladek@...e.cz>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Jiri Kosina <jkosina@...e.cz>, linux-kernel@...r.kernel.org,
	x86@...nel.org
Subject: Re: [PATCH v6 7/8] x86: patch all traced function calls using the
 int3-based framework

On Wed, 2014-01-15 at 10:47 -0500, Steven Rostedt wrote:
> On Tue, 10 Dec 2013 16:42:19 +0100
> Petr Mladek <pmladek@...e.cz> wrote:
> 
> > 
> > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> > index 03abf9abf681..e5c02af3a8cc 100644
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -613,7 +613,7 @@ void __kprobes text_poke_or_die(void *addr, const void *opcode, size_t len)
> >   * responsible for setting the patched code read-write. This is more effective
> >   * only when you patch many addresses at the same time.
> >   */
> > -static int text_poke_part(void *addr, const void *opcode, size_t len)
> > +static int notrace text_poke_part(void *addr, const void *opcode, size_t len)
> 
> I can understand why you want to not trace any of these, but again, I
> hate using notrace unless it is absolutely necessary. That's what the
> set_ftrace_notrace is for. If you don't want to trace it, simply tell
> the kernel not to.

I removed "notrace" from all locations, except for "poke_int3_handler".
Then I tried to switch between 7 tracers, enable and disable tracing,
and repeated this 100 times. It slowed down from:

real    3m25.837s    3m29.280s    3m27.408s
user    0m0.004s     0m0.004s     0m0.000s
sys     0m3.532s     0m3.460s     0m3.416s

to 

real    5m23.294s    5m24.944s    5m28.056s
user    0m0.004s     0m0.004s	  0m0.004s
sys     0m3.076s     0m3.180s	  0m3.396s

It means a change by 60%. As you know indeed, the reason is that the
functions used during patching are called via the int3 handler once the
breakpoint is added.

I thought about patching these functions separately, e.g. using separate
list or extra filters with two round patching. But I think that it is
not worth the extra complexity.

I agree that it might be worth to keep the functions traceable. So, if
the slowness is acceptable for you, I am fine with it as well.

> >  {
> >  	int ret;
> >  
> > @@ -662,7 +662,7 @@ static void *bp_int3_handler, *bp_int3_addr;
> >  static size_t bp_int3_len;
> >  static void *(*bp_int3_is_handled)(const unsigned long ip);
> >  
> > -int poke_int3_handler(struct pt_regs *regs)
> > +int notrace poke_int3_handler(struct pt_regs *regs)
> >  {
> >  	/* bp_patching_in_progress */
> >  	smp_rmb();
[...]
> > @@ -192,44 +162,10 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> >  	old = ftrace_nop_replace();
> >  	new = ftrace_call_replace(ip, addr);
> >  
> > -	/* Should only be called when module is loaded */
> > -	return ftrace_modify_code_direct(rec->ip, old, new);
> > +	return ftrace_modify_code(ip, old, new);
> >  }
> >  
> >  /*
> > - * The modifying_ftrace_code is used to tell the breakpoint
> > - * handler to call ftrace_int3_handler(). If it fails to
> > - * call this handler for a breakpoint added by ftrace, then
> > - * the kernel may crash.
> > - *
> > - * As atomic_writes on x86 do not need a barrier, we do not
> > - * need to add smp_mb()s for this to work. It is also considered
> > - * that we can not read the modifying_ftrace_code before
> > - * executing the breakpoint. That would be quite remarkable if
> > - * it could do that. Here's the flow that is required:
> > - *
> > - *   CPU-0                          CPU-1
> > - *
> > - * atomic_inc(mfc);
> > - * write int3s
> > - *				<trap-int3> // implicit (r)mb
> > - *				if (atomic_read(mfc))
> > - *					call ftrace_int3_handler()
> > - *
> > - * Then when we are finished:
> > - *
> > - * atomic_dec(mfc);
> > - *
> > - * If we hit a breakpoint that was not set by ftrace, it does not
> > - * matter if ftrace_int3_handler() is called or not. It will
> > - * simply be ignored. But it is crucial that a ftrace nop/caller
> > - * breakpoint is handled. No other user should ever place a
> > - * breakpoint on an ftrace nop/caller location. It must only
> > - * be done by this code.
> > - */
> 
> The above is a nice comment. It should be copied over to the
> alternative.c file for the explanation of bp_patching_in_progress.

Well, text_poke is using barriers instead of the atomic variable. I
think that the barriers have two functions in text_poke stuff:

1) make sure that "bp_patching_in_progress" is enabled before the
breakpoint is added; it means that "poke_int3_handler" is opened to
handle the address before the related trap happens.

2) make sure that the variables, e.g. bp_int3_handler, bp_int3_add, have
correct values before they are accessed in "poke_int3_handler".

I think that the atomic variable has only the 1st function. It does not
solve synchronization of the other variables. I thin that it was not a
problem in ftrace because it patched almost static list of addresses
but it might be problem in the generic text_poke_bp.

> > -atomic_t modifying_ftrace_code __read_mostly;
> > -
> > -/*
> >   * Should never be called:
> >   *  As it is only called by __ftrace_replace_code() which is called by
> >   *  ftrace_replace_code() that x86 overrides, and by ftrace_update_code()


Best Regards,
Petr


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/