lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210308131558.6ee6f98e@gandalf.local.home>
Date:   Mon, 8 Mar 2021 13:15:58 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     "Yordan Karadzhov (VMware)" <y.karadz@...il.com>
Cc:     linux-kernel@...r.kernel.org, tglx@...utronix.de,
        peterz@...radead.org
Subject: Re: [RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"

On Mon, 8 Mar 2021 16:57:29 +0200
"Yordan Karadzhov (VMware)" <y.karadz@...il.com> wrote:

> > If we had:
> > 
> >             <...>-37      [004] ...1  2022.303820: gc_worker <-process_one_work
> >             <...>-37      [004] ...1  2022.303820: ___might_sleep <-gc_worker
> >             <...>-37      [004] ...1  2022.303831: ___might_sleep <-gc_worker (last ts: 2022.303828 repeats: 127)
> >             <...>-37      [004] ...1  2022.303831: queue_delayed_work_on <-process_one_work
> > 
> > We would know the last time __might_sleep was called.
> > 
> > That is, not only should we save the ip and pip in the trace_func_repeats
> > structure, but we should also be storing the last time stamp of the last
> > function event that repeated. Otherwise the above looks like the last
> > __might_sleep called above happened when the queue_delayed_work_on
> > happened, where that may not be the case.  
> 
> If we store the last timestamp, this means we will need to use 
> additional 64b on the buffer, every time we record the "func_repeats" 
> event. This looks like an overkill to me.
> Can we store only the duration of the repeats (the difference between 
> the timestamp)? This way we can use less memory at the price of having 
> one extra arithmetic operation.
> Alternative approach can be to store only the least-significant bits of 
> the timestamp.
> 
> What do you think?

I like the way you're thinking ;-)

Let's take a look at the current event sizes.

The function event is defined as:

name: function
ID: 1
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long ip;	offset:8;	size:8;	signed:0;
	field:unsigned long parent_ip;	offset:16;	size:8;	signed:0;

Showing that it's total size is 24 bytes (offset:16 + size:8)

and your current repeat event has:

ID: 17
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long ip;	offset:8;	size:8;	signed:0;
	field:unsigned long pip;	offset:16;	size:8;	signed:0;
	field:unsigned long count;	offset:24;	size:8;	signed:0;

Which is 32 bytes. Adding another 8 would make it 40. It's bigger than one
function event, but still smaller than two, and adding just 4 bytes is
still some size more than a single function event.

Since this event is new, we could even do another trick (one that the
preemptirq events do:

ID: 434
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:s32 caller_offs;	offset:8;	size:4;	signed:1;
	field:s32 parent_offs;	offset:12;	size:4;	signed:1;

print fmt: "caller=%pS parent=%pS", (void *)((unsigned long)(_stext) + REC->caller_offs), (void *)((unsigned long)(_stext) + REC->parent_offs)

The (_stext) is defined in /proc/kallsyms:

# grep stext /proc/kallsyms 
ffffffff8d000000 T _stext

And the offsets are off of that, which we could do the same thing here.

	field:s32 ip;	offset:8;	size:4;	signed:1;
	field:s32 pip;	offset:12;	size:4;	signed:1;
	field:s16 count;	offset:20;	size:2;	signed:1;
	field:u16 top_ts;	offset:22;	size:2; signed:0;
	field:u32 bottom_ts;	offset:24;	size:4; signed:0;

By putting count down to 2 bytes. Having more than 64K repeats is
probably not going to happen, and if it does, we could inject this more
than once ;-)

And as all events must be 4 byte aligned, we could use 6 bytes for the
offset, giving us: 2^40 bits instead of just 2^32 bits, that is, making the
time go from 4 seconds (2^32 nanoseconds) to 18 minutes (2^40 nanoseconds).

I would grab the time stamp after saving the event to make sure that it
doesn't go backwards. It may be a little racy (if an interrupt comes in
between), but it's still a "best effort" approach.

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ