lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 27 May 2024 09:37:47 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, Masami
 Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Andrew Morton
 <akpm@...ux-foundation.org>, Alexei Starovoitov
 <alexei.starovoitov@...il.com>, Florent Revest <revest@...omium.org>,
 Martin KaFai Lau <martin.lau@...ux.dev>, bpf <bpf@...r.kernel.org>, Sven
 Schnelle <svens@...ux.ibm.com>, Alexei Starovoitov <ast@...nel.org>, Jiri
 Olsa <jolsa@...nel.org>, Arnaldo Carvalho de Melo <acme@...nel.org>, Daniel
 Borkmann <daniel@...earbox.net>, Alan Maguire <alan.maguire@...cle.com>,
 Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner
 <tglx@...utronix.de>, Guo Ren <guoren@...nel.org>
Subject: Re: [PATCH 00/20] function_graph: Allow multiple users for function
 graph tracing

On Fri, 24 May 2024 22:36:52 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:

> [
>   Resend for some of you as I missed a comma in the Cc and
>   quilt died sending this out.
> ]
> 
> This is a continuation of the function graph multi user code.
> I wrote a proof of concept back in 2019 of this code[1] and
> Masami started cleaning it up. I started from Masami's work v10
> that can be found here:
> 
>  https://lore.kernel.org/linux-trace-kernel/171509088006.162236.7227326999861366050.stgit@devnote2/
> 
> This is *only* the code that allows multiple users of function
> graph tracing. This is not the fprobe work that Masami is working
> to add on top of it. As Masami took my proof of concept, there
> was still several things I disliked about that code. Instead of
> having Masami clean it up even more, I decided to take over on just
> my code and change it up a bit.
> 
> The biggest changes from where Masami left off is mostly renaming more
> variables, macros, and function names. I fixed up the current comments
> and added more to make the code a bit more understandable.
> 
> At the end of the series, I added two patches to optimize the entry
> and exit. On entry, there was a loop that iterated the 16 elements
> of the fgraph_array[] looking for any that may have a gops registered
> to it. It's quite a waste to do that loop if there's only one
> registered user. To fix that, I added a fgraph_array_bitmask that has
> its bits set that correspond to the elements of the array. Then
> a simple for_each_set_bit() is used for the iteration. I do the same
> thing at the exit callback of the function where it iterates over the
> bits of the bitmap saved on the ret_stack.
> 
> I also noticed that Masami added code to handle tail calls in the
> unwinder and had that in one of my patches. I took that code out
> and made it a separate patch with Masami as the author.
> 
> The diff between this and Masami's last update is at the end of this email.

Thanks for update the patches!
I think your changes are good. I just have some comments and replied.
So, the plan is to push this series in the tracing/for-next? I will
port my fprobe part on it and run some tests.

Thank you,

> 
> Based on Linus commit: 0eb03c7e8e2a4cc3653eb5eeb2d2001182071215
> 
> [1] https://lore.kernel.org/all/20190525031633.811342628@goodmis.org/
> 
> Masami Hiramatsu (Google) (3):
>       function_graph: Handle tail calls for stack unwinding
>       function_graph: Use a simple LRU for fgraph_array index number
>       ftrace: Add multiple fgraph storage selftest
> 
> Steven Rostedt (Google) (2):
>       function_graph: Use for_each_set_bit() in __ftrace_return_to_handler()
>       function_graph: Use bitmask to loop on fgraph entry
> 
> Steven Rostedt (VMware) (15):
>       function_graph: Convert ret_stack to a series of longs
>       fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long
>       function_graph: Add an array structure that will allow multiple callbacks
>       function_graph: Allow multiple users to attach to function graph
>       function_graph: Remove logic around ftrace_graph_entry and return
>       ftrace/function_graph: Pass fgraph_ops to function graph callbacks
>       ftrace: Allow function_graph tracer to be enabled in instances
>       ftrace: Allow ftrace startup flags to exist without dynamic ftrace
>       function_graph: Have the instances use their own ftrace_ops for filtering
>       function_graph: Add "task variables" per task for fgraph_ops
>       function_graph: Move set_graph_function tests to shadow stack global var
>       function_graph: Move graph depth stored data to shadow stack global var
>       function_graph: Move graph notrace bit to shadow stack global var
>       function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
>       function_graph: Add selftest for passing local variables
> 
> ----
>  arch/arm64/kernel/ftrace.c           |  21 +-
>  arch/loongarch/kernel/ftrace_dyn.c   |  15 +-
>  arch/powerpc/kernel/trace/ftrace.c   |   3 +-
>  arch/riscv/kernel/ftrace.c           |  15 +-
>  arch/x86/kernel/ftrace.c             |  19 +-
>  include/linux/ftrace.h               |  42 +-
>  include/linux/sched.h                |   2 +-
>  include/linux/trace_recursion.h      |  39 --
>  kernel/trace/fgraph.c                | 994 ++++++++++++++++++++++++++++-------
>  kernel/trace/ftrace.c                |  11 +-
>  kernel/trace/ftrace_internal.h       |   2 -
>  kernel/trace/trace.h                 |  94 +++-
>  kernel/trace/trace_functions.c       |   8 +
>  kernel/trace/trace_functions_graph.c |  96 ++--
>  kernel/trace/trace_irqsoff.c         |  10 +-
>  kernel/trace/trace_sched_wakeup.c    |  10 +-
>  kernel/trace/trace_selftest.c        | 259 ++++++++-
>  17 files changed, 1330 insertions(+), 310 deletions(-)
> 
> 
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index 3313e4b83aa2..1aae521e5997 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -18,29 +18,36 @@
>  #include "ftrace_internal.h"
>  #include "trace.h"
>  
> -#define FGRAPH_FRAME_SIZE sizeof(struct ftrace_ret_stack)
> -#define FGRAPH_FRAME_OFFSET DIV_ROUND_UP(FGRAPH_FRAME_SIZE, sizeof(long))
>  
>  /*
> - * On entry to a function (via function_graph_enter()), a new ftrace_ret_stack
> - * is allocated on the task's ret_stack with bitmap entry, then each
> - * fgraph_ops on the fgraph_array[]'s entryfunc is called and if that returns
> - * non-zero, the index into the fgraph_array[] for that fgraph_ops is recorded
> - * on the bitmap entry as a bit flag.
> - *
> - * The top of the ret_stack (when not empty) will always have a reference
> - * to the last ftrace_ret_stack saved. All references to the
> - * ftrace_ret_stack has the format of:
> + * FGRAPH_FRAME_SIZE:	Size in bytes of the meta data on the shadow stack
> + * FGRAPH_FRAME_OFFSET:	Size in long words of the meta data frame
> + */
> +#define FGRAPH_FRAME_SIZE	sizeof(struct ftrace_ret_stack)
> +#define FGRAPH_FRAME_OFFSET	DIV_ROUND_UP(FGRAPH_FRAME_SIZE, sizeof(long))
> +
> +/*
> + * On entry to a function (via function_graph_enter()), a new fgraph frame
> + * (ftrace_ret_stack) is pushed onto the stack as well as a word that
> + * holds a bitmask and a type (called "bitmap"). The bitmap is defined as:
>   *
>   * bits:  0 -  9	offset in words from the previous ftrace_ret_stack
> - *			(bitmap type should have FGRAPH_FRAME_OFFSET always)
> + *
>   * bits: 10 - 11	Type of storage
>   *			  0 - reserved
>   *			  1 - bitmap of fgraph_array index
>   *			  2 - reserved data
>   *
> - * For bitmap of fgraph_array index
> + * For type with "bitmap of fgraph_array index" (FGRAPH_TYPE_BITMAP):
>   *  bits: 12 - 27	The bitmap of fgraph_ops fgraph_array index
> + *			That is, it's a bitmask of 0-15 (16 bits)
> + *			where if a corresponding ops in the fgraph_array[]
> + *			expects a callback from the return of the function
> + *			it's corresponding bit will be set.
> + *
> + *
> + * The top of the ret_stack (when not empty) will always have a reference
> + * word that points to the last fgraph frame that was saved.
>   *
>   * For reserved data:
>   *  bits: 12 - 17	The size in words that is stored
> @@ -52,7 +59,7 @@
>   * stored two words of data, this is what will be on the task's shadow
>   * ret_stack: (the stack grows upward)
>   *
> - *  ret_stack[SHADOW_STACK_IN_WORD]
> + *  ret_stack[SHADOW_STACK_OFFSET]
>   * | SHADOW_STACK_TASK_VARS(ret_stack)[15]      |
>   * ...
>   * | SHADOW_STACK_TASK_VARS(ret_stack)[0]       |
> @@ -60,6 +67,8 @@
>   * ...
>   * |                                            | <- task->curr_ret_stack
>   * +--------------------------------------------+
> + * | (3 << 12) | (3 << 10) | FGRAPH_FRAME_OFFSET|
> + * |         *or put another way*               |
>   * | (3 << FGRAPH_DATA_INDEX_SHIFT)| \          | This is for fgraph_ops[3].
>   * | ((2 - 1) << FGRAPH_DATA_SHIFT)| \          | The data size is 2 words.
>   * | (FGRAPH_TYPE_DATA << FGRAPH_TYPE_SHIFT)| \ |
> @@ -67,7 +76,9 @@
>   * +--------------------------------------------+ ( It is 4 words from the ret_stack)
>   * |            STORED DATA WORD 2              |
>   * |            STORED DATA WORD 1              |
> - * +------i-------------------------------------+
> + * +--------------------------------------------+
> + * | (9 << 12) | (1 << 10) | FGRAPH_FRAME_OFFSET|
> + * |         *or put another way*               |
>   * | (BIT(3)|BIT(0)) << FGRAPH_INDEX_SHIFT | \  |
>   * | FGRAPH_TYPE_BITMAP << FGRAPH_TYPE_SHIFT| \ |
>   * | (offset1:FGRAPH_FRAME_OFFSET)              | <- the offset1 is from here
> @@ -82,17 +93,24 @@
>   *
>   * If a backtrace is required, and the real return pointer needs to be
>   * fetched, then it looks at the task's curr_ret_stack offset, if it
> - * is greater than zero (reserved, or right before poped), it would mask
> + * is greater than zero (reserved, or right before popped), it would mask
>   * the value by FGRAPH_FRAME_OFFSET_MASK to get the offset of the
>   * ftrace_ret_stack structure stored on the shadow stack.
>   */
>  
> -#define FGRAPH_FRAME_OFFSET_SIZE	10
> -#define FGRAPH_FRAME_OFFSET_MASK	GENMASK(FGRAPH_FRAME_OFFSET_SIZE - 1, 0)
> +/*
> + * The following is for the top word on the stack:
> + *
> + *   FGRAPH_FRAME_OFFSET (0-9) holds the offset delta to the fgraph frame
> + *   FGRAPH_TYPE (10-11) holds the type of word this is.
> + *     (RESERVED or BITMAP)
> + */
> +#define FGRAPH_FRAME_OFFSET_BITS	10
> +#define FGRAPH_FRAME_OFFSET_MASK	GENMASK(FGRAPH_FRAME_OFFSET_BITS - 1, 0)
>  
> -#define FGRAPH_TYPE_SIZE	2
> -#define FGRAPH_TYPE_MASK	GENMASK(FGRAPH_TYPE_SIZE - 1, 0)
> -#define FGRAPH_TYPE_SHIFT	FGRAPH_FRAME_OFFSET_SIZE
> +#define FGRAPH_TYPE_BITS	2
> +#define FGRAPH_TYPE_MASK	GENMASK(FGRAPH_TYPE_BITS - 1, 0)
> +#define FGRAPH_TYPE_SHIFT	FGRAPH_FRAME_OFFSET_BITS
>  
>  enum {
>  	FGRAPH_TYPE_RESERVED	= 0,
> @@ -100,31 +118,48 @@ enum {
>  	FGRAPH_TYPE_DATA	= 2,
>  };
>  
> -#define FGRAPH_INDEX_SIZE	16
> -#define FGRAPH_INDEX_MASK	GENMASK(FGRAPH_INDEX_SIZE - 1, 0)
> -#define FGRAPH_INDEX_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
> +/*
> + * For BITMAP type:
> + *   FGRAPH_INDEX (12-27) bits holding the gops index wanting return callback called
> + */
> +#define FGRAPH_INDEX_BITS	16
> +#define FGRAPH_INDEX_MASK	GENMASK(FGRAPH_INDEX_BITS - 1, 0)
> +#define FGRAPH_INDEX_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_BITS)
>  
> -/* The data size == 0 means 1 word, and 31 (=2^5 - 1) means 32 words. */
> -#define FGRAPH_DATA_SIZE	5
> -#define FGRAPH_DATA_MASK	GENMASK(FGRAPH_DATA_SIZE - 1, 0)
> -#define FGRAPH_DATA_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
> -#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * (1 << FGRAPH_DATA_SIZE))
> +/*
> + * For DATA type:
> + *  FGRAPH_DATA (12-17) bits hold the size of data (in words)
> + *  FGRAPH_INDEX (18-23) bits hold the index for which gops->idx the data is for
> + *
> + * Note:
> + *  data_size == 0 means 1 word, and 31 (=2^5 - 1) means 32 words.
> + */
> +#define FGRAPH_DATA_BITS	5
> +#define FGRAPH_DATA_MASK	GENMASK(FGRAPH_DATA_BITS - 1, 0)
> +#define FGRAPH_DATA_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_BITS)
> +#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * (1 << FGRAPH_DATA_BITS))
>  
> -#define FGRAPH_DATA_INDEX_SIZE	4
> -#define FGRAPH_DATA_INDEX_MASK	GENMASK(FGRAPH_DATA_INDEX_SIZE - 1, 0)
> -#define FGRAPH_DATA_INDEX_SHIFT	(FGRAPH_DATA_SHIFT + FGRAPH_DATA_SIZE)
> +#define FGRAPH_DATA_INDEX_BITS	4
> +#define FGRAPH_DATA_INDEX_MASK	GENMASK(FGRAPH_DATA_INDEX_BITS - 1, 0)
> +#define FGRAPH_DATA_INDEX_SHIFT	(FGRAPH_DATA_SHIFT + FGRAPH_DATA_BITS)
>  
>  #define FGRAPH_MAX_INDEX	\
> -	((FGRAPH_INDEX_SIZE << FGRAPH_DATA_SIZE) + FGRAPH_RET_INDEX)
> +	((FGRAPH_INDEX_SIZE << FGRAPH_DATA_BITS) + FGRAPH_RET_INDEX)
>  
> -#define FGRAPH_ARRAY_SIZE	FGRAPH_INDEX_SIZE
> +#define FGRAPH_ARRAY_SIZE	FGRAPH_INDEX_BITS
>  
> -#define SHADOW_STACK_SIZE (PAGE_SIZE)
> -#define SHADOW_STACK_IN_WORD (SHADOW_STACK_SIZE / sizeof(long))
> +/*
> + * SHADOW_STACK_SIZE:	The size in bytes of the entire shadow stack
> + * SHADOW_STACK_OFFSET:	The size in long words of the shadow stack
> + * SHADOW_STACK_MAX_OFFSET: The max offset of the stack for a new frame to be added
> + */
> +#define SHADOW_STACK_SIZE	(PAGE_SIZE)
> +#define SHADOW_STACK_OFFSET	(SHADOW_STACK_SIZE / sizeof(long))
>  /* Leave on a buffer at the end */
>  #define SHADOW_STACK_MAX_OFFSET				\
> -	(SHADOW_STACK_IN_WORD - (FGRAPH_FRAME_OFFSET + 1 + FGRAPH_ARRAY_SIZE))
> +	(SHADOW_STACK_OFFSET - (FGRAPH_FRAME_OFFSET + 1 + FGRAPH_ARRAY_SIZE))
>  
> +/* RET_STACK():		Return the frame from a given @offset from task @t */
>  #define RET_STACK(t, offset) ((struct ftrace_ret_stack *)(&(t)->ret_stack[offset]))
>  
>  /*
> @@ -132,12 +167,13 @@ enum {
>   * ret_stack to store task specific state.
>   */
>  #define SHADOW_STACK_TASK_VARS(ret_stack) \
> -	((unsigned long *)(&(ret_stack)[SHADOW_STACK_IN_WORD - FGRAPH_ARRAY_SIZE]))
> +	((unsigned long *)(&(ret_stack)[SHADOW_STACK_OFFSET - FGRAPH_ARRAY_SIZE]))
>  
>  DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
>  int ftrace_graph_active;
>  
>  static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
> +static unsigned long fgraph_array_bitmask;
>  
>  /* LRU index table for fgraph_array */
>  static int fgraph_lru_table[FGRAPH_ARRAY_SIZE];
> @@ -162,6 +198,8 @@ static int fgraph_lru_release_index(int idx)
>  
>  	fgraph_lru_table[fgraph_lru_last] = idx;
>  	fgraph_lru_last = (fgraph_lru_last + 1) % FGRAPH_ARRAY_SIZE;
> +
> +	clear_bit(idx, &fgraph_array_bitmask);
>  	return 0;
>  }
>  
> @@ -176,70 +214,77 @@ static int fgraph_lru_alloc_index(void)
>  
>  	fgraph_lru_table[fgraph_lru_next] = -1;
>  	fgraph_lru_next = (fgraph_lru_next + 1) % FGRAPH_ARRAY_SIZE;
> +
> +	set_bit(idx, &fgraph_array_bitmask);
>  	return idx;
>  }
>  
> +/* Get the offset to the fgraph frame from a ret_stack value */
>  static inline int __get_offset(unsigned long val)
>  {
>  	return val & FGRAPH_FRAME_OFFSET_MASK;
>  }
>  
> +/* Get the type of word from a ret_stack value */
>  static inline int __get_type(unsigned long val)
>  {
>  	return (val >> FGRAPH_TYPE_SHIFT) & FGRAPH_TYPE_MASK;
>  }
>  
> +/* Get the data_index for a DATA type ret_stack word */
>  static inline int __get_data_index(unsigned long val)
>  {
>  	return (val >> FGRAPH_DATA_INDEX_SHIFT) & FGRAPH_DATA_INDEX_MASK;
>  }
>  
> +/* Get the data_size for a DATA type ret_stack word */
>  static inline int __get_data_size(unsigned long val)
>  {
>  	return ((val >> FGRAPH_DATA_SHIFT) & FGRAPH_DATA_MASK) + 1;
>  }
>  
> +/* Get the word from the ret_stack at @offset */
>  static inline unsigned long get_fgraph_entry(struct task_struct *t, int offset)
>  {
>  	return t->ret_stack[offset];
>  }
>  
> +/* Get the FRAME_OFFSET from the word from the @offset on ret_stack */
>  static inline int get_frame_offset(struct task_struct *t, int offset)
>  {
>  	return __get_offset(t->ret_stack[offset]);
>  }
>  
> +/* Get FGRAPH_TYPE from the word from the @offset at ret_stack */
>  static inline int get_fgraph_type(struct task_struct *t, int offset)
>  {
>  	return __get_type(t->ret_stack[offset]);
>  }
>  
> +/* For BITMAP type: get the bitmask from the @offset at ret_stack */
>  static inline unsigned long
> -get_fgraph_index_bitmap(struct task_struct *t, int offset)
> +get_bitmap_bits(struct task_struct *t, int offset)
>  {
>  	return (t->ret_stack[offset] >> FGRAPH_INDEX_SHIFT) & FGRAPH_INDEX_MASK;
>  }
>  
> +/* For BITMAP type: set the bits in the bitmap bitmask at @offset on ret_stack */
>  static inline void
> -set_fgraph_index_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
> -{
> -	t->ret_stack[offset] = (bitmap << FGRAPH_INDEX_SHIFT) |
> -		(FGRAPH_TYPE_BITMAP << FGRAPH_TYPE_SHIFT) | FGRAPH_FRAME_OFFSET;
> -}
> -
> -static inline bool is_fgraph_index_set(struct task_struct *t, int offset, int idx)
> +set_bitmap_bits(struct task_struct *t, int offset, unsigned long bitmap)
>  {
> -	return !!(get_fgraph_index_bitmap(t, offset) & BIT(idx));
> +	t->ret_stack[offset] |= (bitmap << FGRAPH_INDEX_SHIFT);
>  }
>  
> +/* Write the bitmap to the ret_stack at @offset (does index, offset and bitmask) */
>  static inline void
> -add_fgraph_index_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
> +set_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
>  {
> -	t->ret_stack[offset] |= (bitmap << FGRAPH_INDEX_SHIFT);
> +	t->ret_stack[offset] = (bitmap << FGRAPH_INDEX_SHIFT) |
> +		(FGRAPH_TYPE_BITMAP << FGRAPH_TYPE_SHIFT) | FGRAPH_FRAME_OFFSET;
>  }
>  
> -/* Note: @offset is the offset of FGRAPH_TYPE_DATA entry. */
> -static inline void *get_fgraph_data(struct task_struct *t, int offset)
> +/* For DATA type: get the data saved under the ret_stack word at @offset */
> +static inline void *get_data_type_data(struct task_struct *t, int offset)
>  {
>  	unsigned long val = t->ret_stack[offset];
>  
> @@ -249,7 +294,8 @@ static inline void *get_fgraph_data(struct task_struct *t, int offset)
>  	return (void *)&t->ret_stack[offset];
>  }
>  
> -static inline unsigned long make_fgraph_data(int idx, int size, int offset)
> +/* Create the ret_stack word for a DATA type */
> +static inline unsigned long make_data_type_val(int idx, int size, int offset)
>  {
>  	return (idx << FGRAPH_DATA_INDEX_SHIFT) |
>  		((size - 1) << FGRAPH_DATA_SHIFT) |
> @@ -326,7 +372,7 @@ void *fgraph_reserve_data(int idx, int size_bytes)
>  	if (unlikely(curr_ret_stack >= SHADOW_STACK_MAX_OFFSET))
>  		return NULL;
>  
> -	val = make_fgraph_data(idx, data_size, __get_offset(val) + data_size + 1);
> +	val = make_data_type_val(idx, data_size, __get_offset(val) + data_size + 1);
>  
>  	/* Set the last word to be reserved */
>  	current->ret_stack[curr_ret_stack - 1] = val;
> @@ -371,7 +417,7 @@ void *fgraph_retrieve_data(int idx, int *size_bytes)
>  found:
>  	if (size_bytes)
>  		*size_bytes = __get_data_size(val) * sizeof(long);
> -	return get_fgraph_data(current, offset);
> +	return get_data_type_data(current, offset);
>  }
>  
>  /**
> @@ -394,6 +440,9 @@ unsigned long *fgraph_get_task_var(struct fgraph_ops *gops)
>   * @offset: The offset into @t->ret_stack to find the ret_stack entry
>   * @frame_offset: Where to place the offset into @t->ret_stack of that entry
>   *
> + * Returns a pointer to the previous ret_stack below @offset or NULL
> + *   when it reaches the bottom of the stack.
> + *
>   * Calling this with:
>   *
>   *   offset = task->curr_ret_stack;
> @@ -493,30 +542,21 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
>  	if (!current->ret_stack)
>  		return -EBUSY;
>  
> -	/*
> -	 * At first, check whether the previous fgraph callback is pushed by
> -	 * the fgraph on the same function entry.
> -	 * But if @func is the self tail-call function, we also need to ensure
> -	 * the ret_stack is not for the previous call by checking whether the
> -	 * bit of @fgraph_idx is set or not.
> -	 */
> -	ret_stack = get_ret_stack(current, current->curr_ret_stack, &offset);
> -	if (ret_stack && ret_stack->func == func &&
> -	    get_fgraph_type(current, offset + FGRAPH_FRAME_OFFSET) == FGRAPH_TYPE_BITMAP &&
> -	    !is_fgraph_index_set(current, offset + FGRAPH_FRAME_OFFSET, fgraph_idx))
> -		return offset + FGRAPH_FRAME_OFFSET;
> +	BUILD_BUG_ON(SHADOW_STACK_SIZE % sizeof(long));
>  
> +	/* Set val to "reserved" with the delta to the new fgraph frame */
>  	val = (FGRAPH_TYPE_RESERVED << FGRAPH_TYPE_SHIFT) | FGRAPH_FRAME_OFFSET;
>  
> -	BUILD_BUG_ON(SHADOW_STACK_SIZE % sizeof(long));
> -
>  	/*
>  	 * We must make sure the ret_stack is tested before we read
>  	 * anything else.
>  	 */
>  	smp_rmb();
>  
> -	/* The return trace stack is full */
> +	/*
> +	 * Check if there's room on the shadow stack to fit a fraph frame
> +	 * and a bitmap word.
> +	 */
>  	if (current->curr_ret_stack + FGRAPH_FRAME_OFFSET + 1 >= SHADOW_STACK_MAX_OFFSET) {
>  		atomic_inc(&current->trace_overrun);
>  		return -EBUSY;
> @@ -545,7 +585,7 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
>  	 * at least a correct offset!
>  	 */
>  	barrier();
> -	current->curr_ret_stack = offset + 1;
> +	WRITE_ONCE(current->curr_ret_stack, offset + 1);
>  	/*
>  	 * This next barrier is to ensure that an interrupt coming in
>  	 * will not corrupt what we are about to write.
> @@ -597,7 +637,8 @@ int function_graph_enter(unsigned long ret, unsigned long func,
>  	if (offset < 0)
>  		goto out;
>  
> -	for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
> +	for_each_set_bit(i, &fgraph_array_bitmask,
> +			 sizeof(fgraph_array_bitmask) * BITS_PER_BYTE) {
>  		struct fgraph_ops *gops = fgraph_array[i];
>  		int save_curr_ret_stack;
>  
> @@ -620,7 +661,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
>  	 * Since this function uses fgraph_idx = 0 as a tail-call checking
>  	 * flag, set that bit always.
>  	 */
> -	set_fgraph_index_bitmap(current, offset, bitmap | BIT(0));
> +	set_bitmap(current, offset, bitmap | BIT(0));
>  
>  	return 0;
>   out_ret:
> @@ -659,9 +700,9 @@ int function_graph_enter_ops(unsigned long ret, unsigned long func,
>  	save_curr_ret_stack = current->curr_ret_stack;
>  	if (gops->entryfunc(&trace, gops)) {
>  		if (type == FGRAPH_TYPE_RESERVED)
> -			set_fgraph_index_bitmap(current, offset, BIT(gops->idx));
> +			set_bitmap(current, offset, BIT(gops->idx));
>  		else
> -			add_fgraph_index_bitmap(current, offset, BIT(gops->idx));
> +			set_bitmap_bits(current, offset, BIT(gops->idx));
>  		return 0;
>  	} else
>  		current->curr_ret_stack = save_curr_ret_stack;
> @@ -791,12 +832,11 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
>  	trace.retval = fgraph_ret_regs_return_value(ret_regs);
>  #endif
>  
> -	bitmap = get_fgraph_index_bitmap(current, offset);
> -	for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
> +	bitmap = get_bitmap_bits(current, offset);
> +
> +	for_each_set_bit(i, &bitmap, sizeof(bitmap) * BITS_PER_BYTE) {
>  		struct fgraph_ops *gops = fgraph_array[i];
>  
> -		if (!(bitmap & BIT(i)))
> -			continue;
>  		if (gops == &fgraph_stub)
>  			continue;
>  
> @@ -879,9 +919,10 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
>  				    unsigned long ret, unsigned long *retp)
>  {
>  	struct ftrace_ret_stack *ret_stack;
> +	unsigned long return_handler = (unsigned long)dereference_kernel_function_descriptor(return_to_handler);
>  	int i = task->curr_ret_stack;
>  
> -	if (ret != (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
> +	if (ret != return_handler)
>  		return ret;
>  
>  	while (i > 0) {
> @@ -891,14 +932,13 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
>  		/*
>  		 * For the tail-call, there would be 2 or more ftrace_ret_stacks on
>  		 * the ret_stack, which records "return_to_handler" as the return
> -		 * address excpt for the last one.
> +		 * address except for the last one.
>  		 * But on the real stack, there should be 1 entry because tail-call
>  		 * reuses the return address on the stack and jump to the next function.
>  		 * Thus we will continue to find real return address.
>  		 */
>  		if (ret_stack->retp == retp &&
> -		    ret_stack->ret !=
> -		    (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
> +		    ret_stack->ret != return_handler)
>  			return ret_stack->ret;
>  	}
>  
> @@ -909,10 +949,11 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
>  				    unsigned long ret, unsigned long *retp)
>  {
>  	struct ftrace_ret_stack *ret_stack;
> +	unsigned long return_handler = (unsigned long)dereference_kernel_function_descriptor(return_to_handler);
>  	int offset = task->curr_ret_stack;
>  	int i;
>  
> -	if (ret != (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
> +	if (ret != return_handler)
>  		return ret;
>  
>  	if (!idx)
> @@ -921,8 +962,7 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
>  	i = *idx;
>  	do {
>  		ret_stack = get_ret_stack(task, offset, &offset);
> -		if (ret_stack && ret_stack->ret ==
> -		    (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
> +		if (ret_stack && ret_stack->ret == return_handler)
>  			continue;
>  		i--;
>  	} while (i >= 0 && ret_stack);


-- 
Masami Hiramatsu (Google) <mhiramat@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ