[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240527093747.cb9e49d660684158e8173133@kernel.org>
Date: Mon, 27 May 2024 09:37:47 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, Masami
Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Andrew Morton
<akpm@...ux-foundation.org>, Alexei Starovoitov
<alexei.starovoitov@...il.com>, Florent Revest <revest@...omium.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, bpf <bpf@...r.kernel.org>, Sven
Schnelle <svens@...ux.ibm.com>, Alexei Starovoitov <ast@...nel.org>, Jiri
Olsa <jolsa@...nel.org>, Arnaldo Carvalho de Melo <acme@...nel.org>, Daniel
Borkmann <daniel@...earbox.net>, Alan Maguire <alan.maguire@...cle.com>,
Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner
<tglx@...utronix.de>, Guo Ren <guoren@...nel.org>
Subject: Re: [PATCH 00/20] function_graph: Allow multiple users for function
graph tracing
On Fri, 24 May 2024 22:36:52 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:
> [
> Resend for some of you as I missed a comma in the Cc and
> quilt died sending this out.
> ]
>
> This is a continuation of the function graph multi user code.
> I wrote a proof of concept back in 2019 of this code[1] and
> Masami started cleaning it up. I started from Masami's work v10
> that can be found here:
>
> https://lore.kernel.org/linux-trace-kernel/171509088006.162236.7227326999861366050.stgit@devnote2/
>
> This is *only* the code that allows multiple users of function
> graph tracing. This is not the fprobe work that Masami is working
> to add on top of it. As Masami took my proof of concept, there
> was still several things I disliked about that code. Instead of
> having Masami clean it up even more, I decided to take over on just
> my code and change it up a bit.
>
> The biggest changes from where Masami left off is mostly renaming more
> variables, macros, and function names. I fixed up the current comments
> and added more to make the code a bit more understandable.
>
> At the end of the series, I added two patches to optimize the entry
> and exit. On entry, there was a loop that iterated the 16 elements
> of the fgraph_array[] looking for any that may have a gops registered
> to it. It's quite a waste to do that loop if there's only one
> registered user. To fix that, I added a fgraph_array_bitmask that has
> its bits set that correspond to the elements of the array. Then
> a simple for_each_set_bit() is used for the iteration. I do the same
> thing at the exit callback of the function where it iterates over the
> bits of the bitmap saved on the ret_stack.
>
> I also noticed that Masami added code to handle tail calls in the
> unwinder and had that in one of my patches. I took that code out
> and made it a separate patch with Masami as the author.
>
> The diff between this and Masami's last update is at the end of this email.
Thanks for update the patches!
I think your changes are good. I just have some comments and replied.
So, the plan is to push this series in the tracing/for-next? I will
port my fprobe part on it and run some tests.
Thank you,
>
> Based on Linus commit: 0eb03c7e8e2a4cc3653eb5eeb2d2001182071215
>
> [1] https://lore.kernel.org/all/20190525031633.811342628@goodmis.org/
>
> Masami Hiramatsu (Google) (3):
> function_graph: Handle tail calls for stack unwinding
> function_graph: Use a simple LRU for fgraph_array index number
> ftrace: Add multiple fgraph storage selftest
>
> Steven Rostedt (Google) (2):
> function_graph: Use for_each_set_bit() in __ftrace_return_to_handler()
> function_graph: Use bitmask to loop on fgraph entry
>
> Steven Rostedt (VMware) (15):
> function_graph: Convert ret_stack to a series of longs
> fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long
> function_graph: Add an array structure that will allow multiple callbacks
> function_graph: Allow multiple users to attach to function graph
> function_graph: Remove logic around ftrace_graph_entry and return
> ftrace/function_graph: Pass fgraph_ops to function graph callbacks
> ftrace: Allow function_graph tracer to be enabled in instances
> ftrace: Allow ftrace startup flags to exist without dynamic ftrace
> function_graph: Have the instances use their own ftrace_ops for filtering
> function_graph: Add "task variables" per task for fgraph_ops
> function_graph: Move set_graph_function tests to shadow stack global var
> function_graph: Move graph depth stored data to shadow stack global var
> function_graph: Move graph notrace bit to shadow stack global var
> function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
> function_graph: Add selftest for passing local variables
>
> ----
> arch/arm64/kernel/ftrace.c | 21 +-
> arch/loongarch/kernel/ftrace_dyn.c | 15 +-
> arch/powerpc/kernel/trace/ftrace.c | 3 +-
> arch/riscv/kernel/ftrace.c | 15 +-
> arch/x86/kernel/ftrace.c | 19 +-
> include/linux/ftrace.h | 42 +-
> include/linux/sched.h | 2 +-
> include/linux/trace_recursion.h | 39 --
> kernel/trace/fgraph.c | 994 ++++++++++++++++++++++++++++-------
> kernel/trace/ftrace.c | 11 +-
> kernel/trace/ftrace_internal.h | 2 -
> kernel/trace/trace.h | 94 +++-
> kernel/trace/trace_functions.c | 8 +
> kernel/trace/trace_functions_graph.c | 96 ++--
> kernel/trace/trace_irqsoff.c | 10 +-
> kernel/trace/trace_sched_wakeup.c | 10 +-
> kernel/trace/trace_selftest.c | 259 ++++++++-
> 17 files changed, 1330 insertions(+), 310 deletions(-)
>
>
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index 3313e4b83aa2..1aae521e5997 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -18,29 +18,36 @@
> #include "ftrace_internal.h"
> #include "trace.h"
>
> -#define FGRAPH_FRAME_SIZE sizeof(struct ftrace_ret_stack)
> -#define FGRAPH_FRAME_OFFSET DIV_ROUND_UP(FGRAPH_FRAME_SIZE, sizeof(long))
>
> /*
> - * On entry to a function (via function_graph_enter()), a new ftrace_ret_stack
> - * is allocated on the task's ret_stack with bitmap entry, then each
> - * fgraph_ops on the fgraph_array[]'s entryfunc is called and if that returns
> - * non-zero, the index into the fgraph_array[] for that fgraph_ops is recorded
> - * on the bitmap entry as a bit flag.
> - *
> - * The top of the ret_stack (when not empty) will always have a reference
> - * to the last ftrace_ret_stack saved. All references to the
> - * ftrace_ret_stack has the format of:
> + * FGRAPH_FRAME_SIZE: Size in bytes of the meta data on the shadow stack
> + * FGRAPH_FRAME_OFFSET: Size in long words of the meta data frame
> + */
> +#define FGRAPH_FRAME_SIZE sizeof(struct ftrace_ret_stack)
> +#define FGRAPH_FRAME_OFFSET DIV_ROUND_UP(FGRAPH_FRAME_SIZE, sizeof(long))
> +
> +/*
> + * On entry to a function (via function_graph_enter()), a new fgraph frame
> + * (ftrace_ret_stack) is pushed onto the stack as well as a word that
> + * holds a bitmask and a type (called "bitmap"). The bitmap is defined as:
> *
> * bits: 0 - 9 offset in words from the previous ftrace_ret_stack
> - * (bitmap type should have FGRAPH_FRAME_OFFSET always)
> + *
> * bits: 10 - 11 Type of storage
> * 0 - reserved
> * 1 - bitmap of fgraph_array index
> * 2 - reserved data
> *
> - * For bitmap of fgraph_array index
> + * For type with "bitmap of fgraph_array index" (FGRAPH_TYPE_BITMAP):
> * bits: 12 - 27 The bitmap of fgraph_ops fgraph_array index
> + * That is, it's a bitmask of 0-15 (16 bits)
> + * where if a corresponding ops in the fgraph_array[]
> + * expects a callback from the return of the function
> + * it's corresponding bit will be set.
> + *
> + *
> + * The top of the ret_stack (when not empty) will always have a reference
> + * word that points to the last fgraph frame that was saved.
> *
> * For reserved data:
> * bits: 12 - 17 The size in words that is stored
> @@ -52,7 +59,7 @@
> * stored two words of data, this is what will be on the task's shadow
> * ret_stack: (the stack grows upward)
> *
> - * ret_stack[SHADOW_STACK_IN_WORD]
> + * ret_stack[SHADOW_STACK_OFFSET]
> * | SHADOW_STACK_TASK_VARS(ret_stack)[15] |
> * ...
> * | SHADOW_STACK_TASK_VARS(ret_stack)[0] |
> @@ -60,6 +67,8 @@
> * ...
> * | | <- task->curr_ret_stack
> * +--------------------------------------------+
> + * | (3 << 12) | (3 << 10) | FGRAPH_FRAME_OFFSET|
> + * | *or put another way* |
> * | (3 << FGRAPH_DATA_INDEX_SHIFT)| \ | This is for fgraph_ops[3].
> * | ((2 - 1) << FGRAPH_DATA_SHIFT)| \ | The data size is 2 words.
> * | (FGRAPH_TYPE_DATA << FGRAPH_TYPE_SHIFT)| \ |
> @@ -67,7 +76,9 @@
> * +--------------------------------------------+ ( It is 4 words from the ret_stack)
> * | STORED DATA WORD 2 |
> * | STORED DATA WORD 1 |
> - * +------i-------------------------------------+
> + * +--------------------------------------------+
> + * | (9 << 12) | (1 << 10) | FGRAPH_FRAME_OFFSET|
> + * | *or put another way* |
> * | (BIT(3)|BIT(0)) << FGRAPH_INDEX_SHIFT | \ |
> * | FGRAPH_TYPE_BITMAP << FGRAPH_TYPE_SHIFT| \ |
> * | (offset1:FGRAPH_FRAME_OFFSET) | <- the offset1 is from here
> @@ -82,17 +93,24 @@
> *
> * If a backtrace is required, and the real return pointer needs to be
> * fetched, then it looks at the task's curr_ret_stack offset, if it
> - * is greater than zero (reserved, or right before poped), it would mask
> + * is greater than zero (reserved, or right before popped), it would mask
> * the value by FGRAPH_FRAME_OFFSET_MASK to get the offset of the
> * ftrace_ret_stack structure stored on the shadow stack.
> */
>
> -#define FGRAPH_FRAME_OFFSET_SIZE 10
> -#define FGRAPH_FRAME_OFFSET_MASK GENMASK(FGRAPH_FRAME_OFFSET_SIZE - 1, 0)
> +/*
> + * The following is for the top word on the stack:
> + *
> + * FGRAPH_FRAME_OFFSET (0-9) holds the offset delta to the fgraph frame
> + * FGRAPH_TYPE (10-11) holds the type of word this is.
> + * (RESERVED or BITMAP)
> + */
> +#define FGRAPH_FRAME_OFFSET_BITS 10
> +#define FGRAPH_FRAME_OFFSET_MASK GENMASK(FGRAPH_FRAME_OFFSET_BITS - 1, 0)
>
> -#define FGRAPH_TYPE_SIZE 2
> -#define FGRAPH_TYPE_MASK GENMASK(FGRAPH_TYPE_SIZE - 1, 0)
> -#define FGRAPH_TYPE_SHIFT FGRAPH_FRAME_OFFSET_SIZE
> +#define FGRAPH_TYPE_BITS 2
> +#define FGRAPH_TYPE_MASK GENMASK(FGRAPH_TYPE_BITS - 1, 0)
> +#define FGRAPH_TYPE_SHIFT FGRAPH_FRAME_OFFSET_BITS
>
> enum {
> FGRAPH_TYPE_RESERVED = 0,
> @@ -100,31 +118,48 @@ enum {
> FGRAPH_TYPE_DATA = 2,
> };
>
> -#define FGRAPH_INDEX_SIZE 16
> -#define FGRAPH_INDEX_MASK GENMASK(FGRAPH_INDEX_SIZE - 1, 0)
> -#define FGRAPH_INDEX_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
> +/*
> + * For BITMAP type:
> + * FGRAPH_INDEX (12-27) bits holding the gops index wanting return callback called
> + */
> +#define FGRAPH_INDEX_BITS 16
> +#define FGRAPH_INDEX_MASK GENMASK(FGRAPH_INDEX_BITS - 1, 0)
> +#define FGRAPH_INDEX_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_BITS)
>
> -/* The data size == 0 means 1 word, and 31 (=2^5 - 1) means 32 words. */
> -#define FGRAPH_DATA_SIZE 5
> -#define FGRAPH_DATA_MASK GENMASK(FGRAPH_DATA_SIZE - 1, 0)
> -#define FGRAPH_DATA_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
> -#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * (1 << FGRAPH_DATA_SIZE))
> +/*
> + * For DATA type:
> + * FGRAPH_DATA (12-17) bits hold the size of data (in words)
> + * FGRAPH_INDEX (18-23) bits hold the index for which gops->idx the data is for
> + *
> + * Note:
> + * data_size == 0 means 1 word, and 31 (=2^5 - 1) means 32 words.
> + */
> +#define FGRAPH_DATA_BITS 5
> +#define FGRAPH_DATA_MASK GENMASK(FGRAPH_DATA_BITS - 1, 0)
> +#define FGRAPH_DATA_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_BITS)
> +#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * (1 << FGRAPH_DATA_BITS))
>
> -#define FGRAPH_DATA_INDEX_SIZE 4
> -#define FGRAPH_DATA_INDEX_MASK GENMASK(FGRAPH_DATA_INDEX_SIZE - 1, 0)
> -#define FGRAPH_DATA_INDEX_SHIFT (FGRAPH_DATA_SHIFT + FGRAPH_DATA_SIZE)
> +#define FGRAPH_DATA_INDEX_BITS 4
> +#define FGRAPH_DATA_INDEX_MASK GENMASK(FGRAPH_DATA_INDEX_BITS - 1, 0)
> +#define FGRAPH_DATA_INDEX_SHIFT (FGRAPH_DATA_SHIFT + FGRAPH_DATA_BITS)
>
> #define FGRAPH_MAX_INDEX \
> - ((FGRAPH_INDEX_SIZE << FGRAPH_DATA_SIZE) + FGRAPH_RET_INDEX)
> + ((FGRAPH_INDEX_SIZE << FGRAPH_DATA_BITS) + FGRAPH_RET_INDEX)
>
> -#define FGRAPH_ARRAY_SIZE FGRAPH_INDEX_SIZE
> +#define FGRAPH_ARRAY_SIZE FGRAPH_INDEX_BITS
>
> -#define SHADOW_STACK_SIZE (PAGE_SIZE)
> -#define SHADOW_STACK_IN_WORD (SHADOW_STACK_SIZE / sizeof(long))
> +/*
> + * SHADOW_STACK_SIZE: The size in bytes of the entire shadow stack
> + * SHADOW_STACK_OFFSET: The size in long words of the shadow stack
> + * SHADOW_STACK_MAX_OFFSET: The max offset of the stack for a new frame to be added
> + */
> +#define SHADOW_STACK_SIZE (PAGE_SIZE)
> +#define SHADOW_STACK_OFFSET (SHADOW_STACK_SIZE / sizeof(long))
> /* Leave on a buffer at the end */
> #define SHADOW_STACK_MAX_OFFSET \
> - (SHADOW_STACK_IN_WORD - (FGRAPH_FRAME_OFFSET + 1 + FGRAPH_ARRAY_SIZE))
> + (SHADOW_STACK_OFFSET - (FGRAPH_FRAME_OFFSET + 1 + FGRAPH_ARRAY_SIZE))
>
> +/* RET_STACK(): Return the frame from a given @offset from task @t */
> #define RET_STACK(t, offset) ((struct ftrace_ret_stack *)(&(t)->ret_stack[offset]))
>
> /*
> @@ -132,12 +167,13 @@ enum {
> * ret_stack to store task specific state.
> */
> #define SHADOW_STACK_TASK_VARS(ret_stack) \
> - ((unsigned long *)(&(ret_stack)[SHADOW_STACK_IN_WORD - FGRAPH_ARRAY_SIZE]))
> + ((unsigned long *)(&(ret_stack)[SHADOW_STACK_OFFSET - FGRAPH_ARRAY_SIZE]))
>
> DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
> int ftrace_graph_active;
>
> static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
> +static unsigned long fgraph_array_bitmask;
>
> /* LRU index table for fgraph_array */
> static int fgraph_lru_table[FGRAPH_ARRAY_SIZE];
> @@ -162,6 +198,8 @@ static int fgraph_lru_release_index(int idx)
>
> fgraph_lru_table[fgraph_lru_last] = idx;
> fgraph_lru_last = (fgraph_lru_last + 1) % FGRAPH_ARRAY_SIZE;
> +
> + clear_bit(idx, &fgraph_array_bitmask);
> return 0;
> }
>
> @@ -176,70 +214,77 @@ static int fgraph_lru_alloc_index(void)
>
> fgraph_lru_table[fgraph_lru_next] = -1;
> fgraph_lru_next = (fgraph_lru_next + 1) % FGRAPH_ARRAY_SIZE;
> +
> + set_bit(idx, &fgraph_array_bitmask);
> return idx;
> }
>
> +/* Get the offset to the fgraph frame from a ret_stack value */
> static inline int __get_offset(unsigned long val)
> {
> return val & FGRAPH_FRAME_OFFSET_MASK;
> }
>
> +/* Get the type of word from a ret_stack value */
> static inline int __get_type(unsigned long val)
> {
> return (val >> FGRAPH_TYPE_SHIFT) & FGRAPH_TYPE_MASK;
> }
>
> +/* Get the data_index for a DATA type ret_stack word */
> static inline int __get_data_index(unsigned long val)
> {
> return (val >> FGRAPH_DATA_INDEX_SHIFT) & FGRAPH_DATA_INDEX_MASK;
> }
>
> +/* Get the data_size for a DATA type ret_stack word */
> static inline int __get_data_size(unsigned long val)
> {
> return ((val >> FGRAPH_DATA_SHIFT) & FGRAPH_DATA_MASK) + 1;
> }
>
> +/* Get the word from the ret_stack at @offset */
> static inline unsigned long get_fgraph_entry(struct task_struct *t, int offset)
> {
> return t->ret_stack[offset];
> }
>
> +/* Get the FRAME_OFFSET from the word from the @offset on ret_stack */
> static inline int get_frame_offset(struct task_struct *t, int offset)
> {
> return __get_offset(t->ret_stack[offset]);
> }
>
> +/* Get FGRAPH_TYPE from the word from the @offset at ret_stack */
> static inline int get_fgraph_type(struct task_struct *t, int offset)
> {
> return __get_type(t->ret_stack[offset]);
> }
>
> +/* For BITMAP type: get the bitmask from the @offset at ret_stack */
> static inline unsigned long
> -get_fgraph_index_bitmap(struct task_struct *t, int offset)
> +get_bitmap_bits(struct task_struct *t, int offset)
> {
> return (t->ret_stack[offset] >> FGRAPH_INDEX_SHIFT) & FGRAPH_INDEX_MASK;
> }
>
> +/* For BITMAP type: set the bits in the bitmap bitmask at @offset on ret_stack */
> static inline void
> -set_fgraph_index_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
> -{
> - t->ret_stack[offset] = (bitmap << FGRAPH_INDEX_SHIFT) |
> - (FGRAPH_TYPE_BITMAP << FGRAPH_TYPE_SHIFT) | FGRAPH_FRAME_OFFSET;
> -}
> -
> -static inline bool is_fgraph_index_set(struct task_struct *t, int offset, int idx)
> +set_bitmap_bits(struct task_struct *t, int offset, unsigned long bitmap)
> {
> - return !!(get_fgraph_index_bitmap(t, offset) & BIT(idx));
> + t->ret_stack[offset] |= (bitmap << FGRAPH_INDEX_SHIFT);
> }
>
> +/* Write the bitmap to the ret_stack at @offset (does index, offset and bitmask) */
> static inline void
> -add_fgraph_index_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
> +set_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
> {
> - t->ret_stack[offset] |= (bitmap << FGRAPH_INDEX_SHIFT);
> + t->ret_stack[offset] = (bitmap << FGRAPH_INDEX_SHIFT) |
> + (FGRAPH_TYPE_BITMAP << FGRAPH_TYPE_SHIFT) | FGRAPH_FRAME_OFFSET;
> }
>
> -/* Note: @offset is the offset of FGRAPH_TYPE_DATA entry. */
> -static inline void *get_fgraph_data(struct task_struct *t, int offset)
> +/* For DATA type: get the data saved under the ret_stack word at @offset */
> +static inline void *get_data_type_data(struct task_struct *t, int offset)
> {
> unsigned long val = t->ret_stack[offset];
>
> @@ -249,7 +294,8 @@ static inline void *get_fgraph_data(struct task_struct *t, int offset)
> return (void *)&t->ret_stack[offset];
> }
>
> -static inline unsigned long make_fgraph_data(int idx, int size, int offset)
> +/* Create the ret_stack word for a DATA type */
> +static inline unsigned long make_data_type_val(int idx, int size, int offset)
> {
> return (idx << FGRAPH_DATA_INDEX_SHIFT) |
> ((size - 1) << FGRAPH_DATA_SHIFT) |
> @@ -326,7 +372,7 @@ void *fgraph_reserve_data(int idx, int size_bytes)
> if (unlikely(curr_ret_stack >= SHADOW_STACK_MAX_OFFSET))
> return NULL;
>
> - val = make_fgraph_data(idx, data_size, __get_offset(val) + data_size + 1);
> + val = make_data_type_val(idx, data_size, __get_offset(val) + data_size + 1);
>
> /* Set the last word to be reserved */
> current->ret_stack[curr_ret_stack - 1] = val;
> @@ -371,7 +417,7 @@ void *fgraph_retrieve_data(int idx, int *size_bytes)
> found:
> if (size_bytes)
> *size_bytes = __get_data_size(val) * sizeof(long);
> - return get_fgraph_data(current, offset);
> + return get_data_type_data(current, offset);
> }
>
> /**
> @@ -394,6 +440,9 @@ unsigned long *fgraph_get_task_var(struct fgraph_ops *gops)
> * @offset: The offset into @t->ret_stack to find the ret_stack entry
> * @frame_offset: Where to place the offset into @t->ret_stack of that entry
> *
> + * Returns a pointer to the previous ret_stack below @offset or NULL
> + * when it reaches the bottom of the stack.
> + *
> * Calling this with:
> *
> * offset = task->curr_ret_stack;
> @@ -493,30 +542,21 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
> if (!current->ret_stack)
> return -EBUSY;
>
> - /*
> - * At first, check whether the previous fgraph callback is pushed by
> - * the fgraph on the same function entry.
> - * But if @func is the self tail-call function, we also need to ensure
> - * the ret_stack is not for the previous call by checking whether the
> - * bit of @fgraph_idx is set or not.
> - */
> - ret_stack = get_ret_stack(current, current->curr_ret_stack, &offset);
> - if (ret_stack && ret_stack->func == func &&
> - get_fgraph_type(current, offset + FGRAPH_FRAME_OFFSET) == FGRAPH_TYPE_BITMAP &&
> - !is_fgraph_index_set(current, offset + FGRAPH_FRAME_OFFSET, fgraph_idx))
> - return offset + FGRAPH_FRAME_OFFSET;
> + BUILD_BUG_ON(SHADOW_STACK_SIZE % sizeof(long));
>
> + /* Set val to "reserved" with the delta to the new fgraph frame */
> val = (FGRAPH_TYPE_RESERVED << FGRAPH_TYPE_SHIFT) | FGRAPH_FRAME_OFFSET;
>
> - BUILD_BUG_ON(SHADOW_STACK_SIZE % sizeof(long));
> -
> /*
> * We must make sure the ret_stack is tested before we read
> * anything else.
> */
> smp_rmb();
>
> - /* The return trace stack is full */
> + /*
> + * Check if there's room on the shadow stack to fit a fraph frame
> + * and a bitmap word.
> + */
> if (current->curr_ret_stack + FGRAPH_FRAME_OFFSET + 1 >= SHADOW_STACK_MAX_OFFSET) {
> atomic_inc(¤t->trace_overrun);
> return -EBUSY;
> @@ -545,7 +585,7 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
> * at least a correct offset!
> */
> barrier();
> - current->curr_ret_stack = offset + 1;
> + WRITE_ONCE(current->curr_ret_stack, offset + 1);
> /*
> * This next barrier is to ensure that an interrupt coming in
> * will not corrupt what we are about to write.
> @@ -597,7 +637,8 @@ int function_graph_enter(unsigned long ret, unsigned long func,
> if (offset < 0)
> goto out;
>
> - for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
> + for_each_set_bit(i, &fgraph_array_bitmask,
> + sizeof(fgraph_array_bitmask) * BITS_PER_BYTE) {
> struct fgraph_ops *gops = fgraph_array[i];
> int save_curr_ret_stack;
>
> @@ -620,7 +661,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
> * Since this function uses fgraph_idx = 0 as a tail-call checking
> * flag, set that bit always.
> */
> - set_fgraph_index_bitmap(current, offset, bitmap | BIT(0));
> + set_bitmap(current, offset, bitmap | BIT(0));
>
> return 0;
> out_ret:
> @@ -659,9 +700,9 @@ int function_graph_enter_ops(unsigned long ret, unsigned long func,
> save_curr_ret_stack = current->curr_ret_stack;
> if (gops->entryfunc(&trace, gops)) {
> if (type == FGRAPH_TYPE_RESERVED)
> - set_fgraph_index_bitmap(current, offset, BIT(gops->idx));
> + set_bitmap(current, offset, BIT(gops->idx));
> else
> - add_fgraph_index_bitmap(current, offset, BIT(gops->idx));
> + set_bitmap_bits(current, offset, BIT(gops->idx));
> return 0;
> } else
> current->curr_ret_stack = save_curr_ret_stack;
> @@ -791,12 +832,11 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
> trace.retval = fgraph_ret_regs_return_value(ret_regs);
> #endif
>
> - bitmap = get_fgraph_index_bitmap(current, offset);
> - for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
> + bitmap = get_bitmap_bits(current, offset);
> +
> + for_each_set_bit(i, &bitmap, sizeof(bitmap) * BITS_PER_BYTE) {
> struct fgraph_ops *gops = fgraph_array[i];
>
> - if (!(bitmap & BIT(i)))
> - continue;
> if (gops == &fgraph_stub)
> continue;
>
> @@ -879,9 +919,10 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
> unsigned long ret, unsigned long *retp)
> {
> struct ftrace_ret_stack *ret_stack;
> + unsigned long return_handler = (unsigned long)dereference_kernel_function_descriptor(return_to_handler);
> int i = task->curr_ret_stack;
>
> - if (ret != (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
> + if (ret != return_handler)
> return ret;
>
> while (i > 0) {
> @@ -891,14 +932,13 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
> /*
> * For the tail-call, there would be 2 or more ftrace_ret_stacks on
> * the ret_stack, which records "return_to_handler" as the return
> - * address excpt for the last one.
> + * address except for the last one.
> * But on the real stack, there should be 1 entry because tail-call
> * reuses the return address on the stack and jump to the next function.
> * Thus we will continue to find real return address.
> */
> if (ret_stack->retp == retp &&
> - ret_stack->ret !=
> - (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
> + ret_stack->ret != return_handler)
> return ret_stack->ret;
> }
>
> @@ -909,10 +949,11 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
> unsigned long ret, unsigned long *retp)
> {
> struct ftrace_ret_stack *ret_stack;
> + unsigned long return_handler = (unsigned long)dereference_kernel_function_descriptor(return_to_handler);
> int offset = task->curr_ret_stack;
> int i;
>
> - if (ret != (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
> + if (ret != return_handler)
> return ret;
>
> if (!idx)
> @@ -921,8 +962,7 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
> i = *idx;
> do {
> ret_stack = get_ret_stack(task, offset, &offset);
> - if (ret_stack && ret_stack->ret ==
> - (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
> + if (ret_stack && ret_stack->ret == return_handler)
> continue;
> i--;
> } while (i >= 0 && ret_stack);
--
Masami Hiramatsu (Google) <mhiramat@...nel.org>
Powered by blists - more mailing lists