[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210308131558.6ee6f98e@gandalf.local.home>
Date: Mon, 8 Mar 2021 13:15:58 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: "Yordan Karadzhov (VMware)" <y.karadz@...il.com>
Cc: linux-kernel@...r.kernel.org, tglx@...utronix.de,
peterz@...radead.org
Subject: Re: [RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"
On Mon, 8 Mar 2021 16:57:29 +0200
"Yordan Karadzhov (VMware)" <y.karadz@...il.com> wrote:
> > If we had:
> >
> > <...>-37 [004] ...1 2022.303820: gc_worker <-process_one_work
> > <...>-37 [004] ...1 2022.303820: ___might_sleep <-gc_worker
> > <...>-37 [004] ...1 2022.303831: ___might_sleep <-gc_worker (last ts: 2022.303828 repeats: 127)
> > <...>-37 [004] ...1 2022.303831: queue_delayed_work_on <-process_one_work
> >
> > We would know the last time __might_sleep was called.
> >
> > That is, not only should we save the ip and pip in the trace_func_repeats
> > structure, but we should also be storing the last time stamp of the last
> > function event that repeated. Otherwise the above looks like the last
> > __might_sleep called above happened when the queue_delayed_work_on
> > happened, where that may not be the case.
>
> If we store the last timestamp, this means we will need to use
> additional 64b on the buffer, every time we record the "func_repeats"
> event. This looks like an overkill to me.
> Can we store only the duration of the repeats (the difference between
> the timestamp)? This way we can use less memory at the price of having
> one extra arithmetic operation.
> Alternative approach can be to store only the least-significant bits of
> the timestamp.
>
> What do you think?
I like the way you're thinking ;-)
Let's take a look at the current event sizes.
The function event is defined as:
name: function
ID: 1
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned long ip; offset:8; size:8; signed:0;
field:unsigned long parent_ip; offset:16; size:8; signed:0;
Showing that it's total size is 24 bytes (offset:16 + size:8)
and your current repeat event has:
ID: 17
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned long ip; offset:8; size:8; signed:0;
field:unsigned long pip; offset:16; size:8; signed:0;
field:unsigned long count; offset:24; size:8; signed:0;
Which is 32 bytes. Adding another 8 would make it 40. It's bigger than one
function event, but still smaller than two, and adding just 4 bytes is
still some size more than a single function event.
Since this event is new, we could even do another trick (one that the
preemptirq events do:
ID: 434
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:s32 caller_offs; offset:8; size:4; signed:1;
field:s32 parent_offs; offset:12; size:4; signed:1;
print fmt: "caller=%pS parent=%pS", (void *)((unsigned long)(_stext) + REC->caller_offs), (void *)((unsigned long)(_stext) + REC->parent_offs)
The (_stext) is defined in /proc/kallsyms:
# grep stext /proc/kallsyms
ffffffff8d000000 T _stext
And the offsets are off of that, which we could do the same thing here.
field:s32 ip; offset:8; size:4; signed:1;
field:s32 pip; offset:12; size:4; signed:1;
field:s16 count; offset:20; size:2; signed:1;
field:u16 top_ts; offset:22; size:2; signed:0;
field:u32 bottom_ts; offset:24; size:4; signed:0;
By putting count down to 2 bytes. Having more than 64K repeats is
probably not going to happen, and if it does, we could inject this more
than once ;-)
And as all events must be 4 byte aligned, we could use 6 bytes for the
offset, giving us: 2^40 bits instead of just 2^32 bits, that is, making the
time go from 4 seconds (2^32 nanoseconds) to 18 minutes (2^40 nanoseconds).
I would grab the time stamp after saving the event to make sure that it
doesn't go backwards. It may be a little racy (if an interrupt comes in
between), but it's still a "best effort" approach.
-- Steve
Powered by blists - more mailing lists