lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110422204740.GA21364@elte.hu>
Date:	Fri, 22 Apr 2011 22:47:40 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Stephane Eranian <eranian@...gle.com>
Cc:	Arnaldo Carvalho de Melo <acme@...radead.org>,
	linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Lin Ming <ming.m.lin@...el.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>, eranian@...il.com,
	Arun Sharma <asharma@...com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [generalized cache events] Re: [PATCH 1/1] perf tools: Add
 missing user space support for config1/config2


* Stephane Eranian <eranian@...gle.com> wrote:

> On Fri, Apr 22, 2011 at 3:18 PM, Ingo Molnar <mingo@...e.hu> wrote:
> >
> > * Stephane Eranian <eranian@...gle.com> wrote:
> >
> > > > Say i'm a developer and i have an app with such code:
> > > >
> > > > #define THOUSAND 1000
> > > >
> > > > static char array[THOUSAND][THOUSAND];
> > > >
> > > > int init_array(void)
> > > > {
> > > >        int i, j;
> > > >
> > > >        for (i = 0; i < THOUSAND; i++) {
> > > >                for (j = 0; j < THOUSAND; j++) {
> > > >                        array[j][i]++;
> > > >                }
> > > >        }
> > > >
> > > >        return 0;
> > > > }
> > > >
> > > > Pretty common stuff, right?
> > > >
> > > > Using the generalized cache events i can run:
> > > >
> > > >  $ perf stat --repeat 10 -e cycles:u -e instructions:u -e l1-dcache-loads:u -e l1-dcache-load-misses:u ./array
> > > >
> > > >  Performance counter stats for './array' (10 runs):
> > > >
> > > >         6,719,130 cycles:u                   ( +-   0.662% )
> > > >         5,084,792 instructions:u           #      0.757 IPC     ( +-   0.000% )
> > > >         1,037,032 l1-dcache-loads:u          ( +-   0.009% )
> > > >         1,003,604 l1-dcache-load-misses:u    ( +-   0.003% )
> > > >
> > > >        0.003802098  seconds time elapsed   ( +-  13.395% )
> > > >
> > > > I consider that this is 'bad', because for almost every dcache-load there's a
> > > > dcache-miss - a 99% L1 cache miss rate!
> > > >
> > > > Then i think a bit, notice something, apply this performance optimization:
> > >
> > > I don't think this example is really representative of the kind of problems
> > > people face, it is just too small and obvious. [...]
> >
> > Well, the overwhelming majority of performance problems are 'small and obvious'
> 
> Problems are not simple. Most serious applications these days are huge, 
> hundreds of MB of text, if not GB.
> 
> In your artificial example, you knew the answer before you started the 
> measurement.
>
> Most of the time, applications are assembled out of hundreds of libraries, so 
> no single developers knows all the code. Thus, the performance analyst is 
> faced with a black box most of the time.

I isolated out an example and assumed that you'd agree that identifying hot 
spots is trivial with generic cache events.

My assumption was wrong so let me show you how trivial it really is.

Here's an example with *two* problematic functions (but it could have hundreds, 
it does not matter):

-------------------------------->
#define THOUSAND 1000

static char array1[THOUSAND][THOUSAND];

static char array2[THOUSAND][THOUSAND];

void func1(void)
{
	int i, j;

	for (i = 0; i < THOUSAND; i++)
		for (j = 0; j < THOUSAND; j++)
			array1[i][j]++;
}

void func2(void)
{
	int i, j;

	for (i = 0; i < THOUSAND; i++)
		for (j = 0; j < THOUSAND; j++)
			array2[j][i]++;
}

int main(void)
{
	for (;;) {
		func1();
		func2();
	}

	return 0;
}
<--------------------------------

We do not know which one has the cache-misses problem, func1() or func2(), it's 
all a black box, right?

Using generic cache events you simply type this:

 $ perf top -e l1-dcache-load-misses -e l1-dcache-loads

And you get such output:

   PerfTop:    1923 irqs/sec  kernel: 0.0%  exact:  0.0% [l1-dcache-load-misses:u/l1-dcache-loads:u],  (all, 16 CPUs)
-------------------------------------------------------------------------------------------------------

   weight    samples  pcnt funct DSO
   ______    _______ _____ _____ ______________________

      1.9       6184 98.8% func2 /home/mingo/opt/array2
      0.0         69  1.1% func1 /home/mingo/opt/array2

It has pinpointed the problem in func2 *very* precisely.

Obviously this can be used to analyze larger apps as well, with thousands of 
functions, to pinpoint cachemiss problems in specific functions.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ