lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 27 Nov 2013 23:17:43 +0900
From:	Namhyung Kim <namhyung@...nel.org>
To:	Arnaldo Carvalho de Melo <acme@...stprotocols.net>
Cc:	Stanislav Fomichev <stfomichev@...dex-team.ru>,
	Chia-I Wu <olvaffe@...il.com>, a.p.zijlstra@...llo.nl,
	paulus@...ba.org, mingo@...hat.com, linux-kernel@...r.kernel.org,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH] perf timechart: dynamically determine event data offset

Hi Arnaldo,

2013-11-27 (수), 10:44 -0300, Arnaldo Carvalho de Melo:
> Em Wed, Nov 27, 2013 at 05:49:02PM +0900, Namhyung Kim escreveu:
> > On Tue, 26 Nov 2013 18:54:37 +0400, Stanislav Fomichev wrote:
> > > Since b000c8065a92 "tracing: Remove the extra 4 bytes of padding in events"
> > > removed padding bytes, perf timechart got out of sync with the kernel's
> > > trace_entry structure.
> 
> > > We can't just align perf's trace_entry definition with the kernel because we
> > > want timechart to continue working with old perf.data. Instead, we now
> > > calculate event data offset dynamically using offset of first non-common
> > > event field in the perf.data.
>  
> > [SNIP]
> > > @@ -304,13 +322,11 @@ struct trace_entry {
> > >  	unsigned char		flags;
> > >  	unsigned char		preempt_count;
> > >  	int			pid;
> > > -	int			lock_depth;
> > >  };
>  
> > I had no chance to look into the timechart code in detail, but this is
> > not good.  The format of each trace event (so the struct trace_entry)
> > was described in the format file under the event directory on debugfs.
> > For cpu_frequency event, I get the following:
>  
> >   $ cat /sys/kernel/debug/tracing/events/power/cpu_frequency/format 
> >   name: cpu_frequency
> >   ID: 315
> >   format:
> > 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> > 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> > 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> > 	field:int common_pid;	offset:4;	size:4;	signed:1;
>  
> > 	field:u32 state;	offset:8;	size:4;	signed:0;
> > 	field:u32 cpu_id;	offset:12;	size:4;	signed:0;
>  
> >   print fmt: "state=%lu cpu_id=%lu", (unsigned long)REC->state, (unsigned long)REC->cpu_id
>  
> > So it's not same as above struct trace_entry even with your change.
> 
> It is... Albeit working just with patches is good and dandy and I even
> was discussing with Jiri how patches should be clean and small, you're
> not seeing the whole picture, see what is at the line just before
> "unsigned char flags":
> 
> struct trace_entry {
>         unsigned short          type;
>         unsigned char           flags;
>         unsigned char           preempt_count;
>         int                     pid;
>         int                     lock_depth;
> };

Argh, I missed that part, sorry. :)

> 
> Looking with pahole to see the offsets on a 64-bit arch:
> 
> [acme@...andy linux]$ pahole -C trace_entry /tmp/build/perf/builtin-timechart.o 
> struct trace_entry {
> 	short unsigned int         type;                 /*     0     2 */
> 	unsigned char              flags;                /*     2     1 */
> 	unsigned char              preempt_count;        /*     3     1 */
> 	int                        pid;                  /*     4     4 */
> 	int                        lock_depth;           /*     8     4 */
> 
> 	/* size: 12, cachelines: 1, members: 5 */
> 	/* last cacheline: 12 bytes */
> };
> [acme@...andy linux]$
> [acme@...andy linux]$ uname -a
> Linux ssdandy 3.12.0-rc6+ #2 SMP Mon Oct 21 11:19:07 BRT 2013 x86_64 x86_64 x86_64 GNU/Linux
> 
> And now on a 32-bit arch:
> 
> acme@...ux-goap:~> pahole -C trace_entry /tmp/build/perf/builtin-timechart.o 
> struct trace_entry {
> 	short unsigned int         type;                 /*     0     2 */
> 	unsigned char              flags;                /*     2     1 */
> 	unsigned char              preempt_count;        /*     3     1 */
> 	int                        pid;                  /*     4     4 */
> 	int                        lock_depth;           /*     8     4 */
> 
> 	/* size: 12, cachelines: 1, members: 5 */
> 	/* last cacheline: 12 bytes */
> };
> acme@...ux-goap:~> uname -a
> Linux linux-goap 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May 31 20:21:23 UTC 2013 (97c14ba) i686 i686 i386 GNU/Linux
> acme@...ux-goap:~>
> 
> Same signature, 32-bit, 64-bit userland, so whoever wrote timechart, Arjan, I
> think, made no mistakes at using the kernel exported interface, choosing the
> most efficient way to extract the data, casting to a struct.
> 
> Yeah, using the interface that searches for a field name to get the offset and
> then add it to a pointer to then cast to the type will allow maximum flexibility,
> but is not really efficient.
> 
> Doing that for something that is not performance critical (probably) as
> timechart probably is not a problem, but so is not a problem using the patch
> that does the cast after finding the offset of the first non-common field.

I'm not sure how it affects the performance really.  But in most cases,
using the libtraceevent would be a better solution IMHO.

> 
> Having said that, I'll take the latest patch, using perf_evsel__intval et all,
> if Namhyung sees no further problems with it.

Okay, after a quick glance I don't see anymore issue.  You can add my
Ack there.

> 
> BTW: in 'perf trace' I recently did some work to cache the offsets on a per evsel
> private area, so that the string lookup is not done anymore in the raw_syscalls
> tracepoint processing.

Good job. :)

Thanks,
Namhyung



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists