linux-kernel - Re: [PATCH 3/3] x86/ftrace: Use text

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191023193442.35lhhrqnyn3bfwpq@ast-mbp.dhcp.thefacebook.com>
Date:   Wed, 23 Oct 2019 12:34:43 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Nadav Amit <nadav.amit@...il.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Song Liu <songliubraving@...com>,
        Masami Hiramatsu <mhiramat@...nel.org>
Subject: Re: [PATCH 3/3] x86/ftrace: Use text_poke()

On Wed, Oct 23, 2019 at 12:23:06PM -0400, Steven Rostedt wrote:
> On Tue, 22 Oct 2019 14:58:43 -0700
> Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:
> 
> > Neither of two statements are true. The per-function generated trampoline
> > I'm talking about is bpf specific. For a function with two arguments it's just:
> > push rbp 
> > mov rbp, rsp
> > push rdi
> > push rsi
> > lea  rdi,[rbp-0x10]
> > call jited_bpf_prog
> > pop rsi
> > pop rdi
> > leave
> > ret
> > 
> > fentry's nop is replaced with call to the above.
> > That's it.
> > kprobe and live patching has no use out of it.
> > 
> 
> Below is a patch that allows you to do this, and you don't need to
> worry about patching the nops. And it also allows to you hook directly
> to any function and still allow kprobes and tracing on those same
> functions (as long as they don't modify the ip, but in the future, we
> may be able to allow that too!). And this code does not depend on
> Peter's code either.
> 
> All you need to do is:
> 
> 	register_ftrace_direct((unsigned long)func_you_want_to_trace,
> 			       (unsigned long)your_trampoline);
> 
> 
> I added to trace-event-samples.c:
> 
> void my_direct_func(raw_spinlock_t *lock)
> {
> 	trace_printk("taking %p\n", lock);
> }
> 
> extern void my_tramp(void *);
> 
> asm (
> "       .pushsection    .text, \"ax\", @progbits\n"
> "   my_tramp:"
> #if 1
> "       pushq %rbp\n"
> "	movq %rsp, %rbp\n"
> "	pushq %rdi\n"
> "	call my_direct_func\n"
> "	popq %rdi\n"
> "	leave\n"
> #endif
> "	ret\n"
> "       .popsection\n"
> );
> 
> 
> (the #if was for testing purposes)
> 
> And then in the module load and unload:
> 
> 	ret = register_ftrace_direct((unsigned long)do_raw_spin_lock,
> 				     (unsigned long)my_tramp);
> 
> 	unregister_ftrace_direct((unsigned long)do_raw_spin_lock,
> 				 (unsigned long)my_tramp);
> 
> respectively.
> 
> And what this does is if there's only a single callback to the
> registered function, it changes the nop in the function to call your
> trampoline directly (just like you want this to do!). But if we add
> another callback, a direct_ops ftrace_ops gets added to the list of the
> functions to go through, and this will set up the code to call your
> trampoline after it calls all the other callbacks.
> 
> The called trampoline will be called as if it was called directly from
> the nop.
> 
> OK, I wrote this up quickly, and it has some bugs, but nothing that
> can't be straighten out (specifically, the checks fail if you add a
> function trace to one of the direct callbacks, but this can be dealt
> with).
> 
> Note, the work needed to port this to other archs is rather minimal
> (just need to tweak the ftrace_regs_caller and have a way to pass back
> the call address via pt_regs that is not saved).
> 
> Alexei,
> 
> Would this work for you?

Yes!
Looks great. More comments below.

> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6adaf18b3365..de3372bd08ae 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -159,6 +159,7 @@ config X86
>  	select HAVE_DYNAMIC_FTRACE
>  	select HAVE_DYNAMIC_FTRACE_WITH_REGS
>  	select HAVE_DYNAMIC_FTRACE_WITH_IPMODIFY
> +	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>  	select HAVE_EBPF_JIT
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
>  	select HAVE_EISA
> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index c38a66661576..34da1e424391 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -28,6 +28,12 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
>  	return addr;
>  }
>  
> +static inline void ftrace_set_call_func(struct pt_regs *regs, unsigned long addr)
> +{
> +	/* Emulate a call */
> +	regs->orig_ax = addr;

This probably needs a longer comment :)

> +	.if \make_call
> +	movq ORIG_RAX(%rsp), %rax
> +	movq %rax, MCOUNT_REG_SIZE-8(%rsp)

reading asm helps.

> +config HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +	bool
> +
>  config HAVE_FTRACE_MCOUNT_RECORD
>  	bool
>  	help
> @@ -565,6 +568,11 @@ config DYNAMIC_FTRACE_WITH_IPMODIFY_ONLY
>  	depends on DYNAMIC_FTRACE
>  	depends on HAVE_DYNAMIC_FTRACE_WITH_IPMODIFY
>  
> +config DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +	def_bool y
> +	depends on DYNAMIC_FTRACE
> +	depends on HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS

It seems to me that it's a bit of overkill to add new config knob
for every ftrace feature.
HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS (that arch defined) would
be enough to check and return error in register_ftrace_direct()
right?

> -static struct ftrace_hash *
> -__ftrace_hash_move(struct ftrace_hash *src)
> +static void transfer_hash(struct ftrace_hash *dst, struct ftrace_hash *src)
>  {
>  	struct ftrace_func_entry *entry;
> -	struct hlist_node *tn;
>  	struct hlist_head *hhd;
> +	struct hlist_node *tn;
> +	int size;
> +	int i;
> +
> +	dst->flags = src->flags;
> +
> +	size = 1 << src->size_bits;
> +	for (i = 0; i < size; i++) {
> +		hhd = &src->buckets[i];
> +		hlist_for_each_entry_safe(entry, tn, hhd, hlist) {
> +			remove_hash_entry(src, entry);
> +			__add_hash_entry(dst, entry);

I don't quite follow why this is needed.
I thought alloc_and_copy_ftrace_hash() can already handle it.
If that is just unrelated cleanup then sure. Looks good.

> +struct ftrace_ops direct_ops = {
> +	.func		= call_direct_funcs,
> +	.flags		= FTRACE_OPS_FL_IPMODIFY | FTRACE_OPS_FL_RECURSION_SAFE
> +#if 1
> +					| FTRACE_OPS_FL_DIRECT
> +#endif
> +#ifndef CONFIG_DYNAMIC_FTRACE_WITH_IPMODIFY_ONLY
> +					| FTRACE_OPS_FL_SAVE_REGS
> +#endif

With FL_DIRECT the CONFIG_DYNAMIC_FTRACE_WITH_IPMODIFY_ONLY won't be needed, right ?
At least not for bpf use case.
Do you see livepatching using it or switching to FL_DIRECT too?

> +	ret = -ENOMEM;
> +	if (ftrace_hash_empty(direct_functions) ||
> +	    direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
> +		struct ftrace_hash *new_hash;
> +		int size = ftrace_hash_empty(direct_functions) ? 0 :
> +			direct_functions->count + 1;
> +		int bits;
> +
> +		if (size < 32)
> +			size = 32;
> +
> +		for (size /= 2; size; size >>= 1)
> +			bits++;
> +
> +		/* Don't allocate too much */
> +		if (bits > FTRACE_HASH_MAX_BITS)
> +			bits = FTRACE_HASH_MAX_BITS;
> +
> +		new_hash = alloc_ftrace_hash(bits);
> +		if (!new_hash)
> +			goto out_unlock;
> +
> +		transfer_hash(new_hash, direct_functions);
> +		free_ftrace_hash(direct_functions);
> +		direct_functions = new_hash;

That's probably racy, no?
ftrace_get_addr_new() is not holding direct_mutex that
protects direct_functions.

> +	if (!ret && !(direct_ops.flags & FTRACE_OPS_FL_ENABLED))
> +		ret = register_ftrace_function(&direct_ops);

Having single direct_ops is nice.

> @@ -2370,6 +2542,10 @@ unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec)
>  {
>  	struct ftrace_ops *ops;
>  
> +	if ((rec->flags & FTRACE_FL_DIRECT) &&
> +	    (ftrace_rec_count(rec) == 1))
> +		return find_rec_direct(rec->ip);
> +

I've started playing with this function as well to
implement 2nd nop approach I mentioned earlier.
I'm going to abandon it, since your approach is better.
It allows not only bpf, but anyone else to register direct.

I have one more question/request.
Looks like ftrace can be turned off with sysctl.
Which means that a person or a script can accidently turn it off
and all existing kprobe+bpf stuff that is ftrace based will
be silently switched off.
User services will be impacted and will make people unhappy.
I'm not sure how to solve the existing situation,
but for FL_DIRECT can you add a check that if particular
nop is modified to do direct call then the only interface to
turn it off is to call unregister_ftrace_direct().
There should no side channel to kill it.
ftrace_disable and ftrace_kill make me nervous too.
Fast forward a year and imagine few host critical bpf progs
are running in production and relying on FL_DIRECT.
Now somebody decided to test new ftrace feature and
it hit one of FTRACE_WARN_ON().
That will shutdown the whole ftrace and bpf progs
will stop executing. I'd like to avoid that.
May be I misread the code?