lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 06 Apr 2009 21:06:58 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Corey Ashford <cjashfor@...ux.vnet.ibm.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Paul Mackerras <paulus@...ba.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/6] perf_counter: add more context information

On Mon, 2009-04-06 at 11:53 -0700, Corey Ashford wrote:
> 
> Peter Zijlstra wrote:
> > On Mon, 2009-04-06 at 13:01 +0200, Peter Zijlstra wrote:
> >> On Fri, 2009-04-03 at 11:25 -0700, Corey Ashford wrote:
> >>> Peter Zijlstra wrote:
> >>>> On Thu, 2009-04-02 at 11:12 +0200, Peter Zijlstra wrote:
> >>>>> plain text document attachment (perf_counter_callchain_context.patch)
> >>>>> Put in counts to tell which ips belong to what context.
> >>>>>
> >>>>>   -----
> >>>>>    | |  hv
> >>>>>    | --
> >>>>> nr | |  kernel
> >>>>>    | --
> >>>>>    | |  user
> >>>>>   -----
> >>>> Right, just realized that PERF_RECORD_IP needs something similar if one
> >>>> if not able to derive the context from the IP itself..
> >>>>
> >>> Three individual bits would suffice, or you could use a two-bit code -
> >>> 00 = user
> >>> 01 = kernel
> >>> 10 = hypervisor
> >>> 11 = reserved (or perhaps unknown)
> >>>
> >>> Unfortunately, because of alignment, it would need to take up another 64 
> >>> bit word, wouldn't it?  Too bad you cannot sneak the bits into the IP in 
> >>> a machine independent way.
> >>>
> >>> And since you probably need a separate word, that effectively doubles 
> >>> the amount of space taken up by IP samples (if we add a "no event 
> >>> header" option).  Should we add another bit in the record_type field - 
> >>> PERF_RECORD_IP_LEVEL (or similar) so that user-space apps don't have to 
> >>> get this if they don't need it?
> >> If we limit the event size to 64k (surely enough, right? :-), then we
> >> have 16 more bits to play with in the header, and we could do something
> >> like the below.
> >>
> >> A further possibility would also be to add an overflow bit in there,
> >> making the full 32bit PERF_RECORD space available to output events as
> >> well.
> >>
> >> Index: linux-2.6/include/linux/perf_counter.h
> >> ===================================================================
> >> --- linux-2.6.orig/include/linux/perf_counter.h
> >> +++ linux-2.6/include/linux/perf_counter.h
> >> @@ -201,9 +201,17 @@ struct perf_counter_mmap_page {
> >>  	__u32   data_head;		/* head in the data section */
> >>  };
> >>  
> >> +enum {
> >> +	PERF_EVENT_LEVEL_HV	= 0,
> >> +	PERF_EVENT_LEVEL_KERNEL = 1,
> >> +	PERF_EVENT_LEVEL_USER	= 2,
> >> +};
> >> +
> >>  struct perf_event_header {
> >>  	__u32	type;
> >> -	__u32	size;
> >> +	__u16	level		:  2,
> >> +		__reserved	: 14;
> >> +	__u16	size;
> >>  };
> > 
> > Except we should probably use masks again instead of bitfields so that
> > the thing is portable when streamed to disk, such as would be common
> > with splice().
> 
> One downside of this approach is that you if you specify "no header" 
> (currently not possible, but maybe later?), you will not be able to get 
> the level bits.

Would this be desirable? I know we've mentioned it before, but it would
mean one cannot mix various event types (currently that means !mmap and
callchain with difficulty).

As long as we mandate this header, we can have 16 misc bits.

> How about adding an optional, 64-bit "miscellaneous" word to the event 
> record which could contain a number of small bit fields, any or all of 
> which could be enabled with a PERF_RECORD_* bit.  If one or more of the 
> miscellaneous PERF_RECORD_* bits are set to enable, this assembled word 
> would be added to the record.  So the space cost of the level field goes 
> down as we add more small fields that need to be recorded.
> 
> Something like:
> 
>   PERF_RECORD_LEVEL = 1U << 4,
>   PERF_RECORD_INTR_DEPTH = 1U << 5,
>   PERF_RECORD_STUFF = 1U << 6,
>   ...
> 
> #define __PERF_MISC_MASK(name)                       \
>          (((1ULL << PERF_MISC_##name##_BITS) - 1) <<  \
>           PERF_MISC_##name##_SHIFT)
> 
> #define PERF_MISC_LEVEL_BITS 2
> #define PERF_MISC_LEVEL_SHIFT 0
> #define PERF_MISC_LEVEL_MASK __PERF_MISC_MASK(LEVEL)
> 
> #define PERF_MISC_INTR_DEPTH_BITS 8
> #define PERF_MISC_INTR_DEPTH_SHIFT 2
> #define PERF_MISC_INTR_DEPTH_MASK __PERF_MISC_MASK(INTR_DEPTH)

Yeah, that's the alternative.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ