linux-kernel - Re: [PATCH 0/9] perf: Adding better precise

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130511075008.GC24435@gmail.com>
Date:	Sat, 11 May 2013 09:50:08 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Jiri Olsa <jolsa@...hat.com>, linux-kernel@...r.kernel.org,
	Corey Ashford <cjashfor@...ux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	Namhyung Kim <namhyung@...nel.org>,
	Paul Mackerras <paulus@...ba.org>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Andi Kleen <ak@...ux.intel.com>,
	David Ahern <dsahern@...il.com>,
	Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH 0/9] perf: Adding better precise_ip field handling


* Peter Zijlstra <peterz@...radead.org> wrote:

> On Fri, May 10, 2013 at 12:55:36PM +0200, Ingo Molnar wrote:
> > Look at the tools/perf/ patches, they don't actually need or use that 
> > information to adjust for skid!
> > 
> > If user-space wants _that_ level of control because it wants to correct 
> > for skid (if there's skid), or if it wants to display to the user how 
> > precise the profiling is, then they can do the (much) more complex probing 
> > dance.
> > 
> > What is absolutely indefensible is to not give a good shortcut for the 
> > most common case of 'give me the most precise cycles event you got'...
> 
> That's not what I'm saying... the user (not userspace, but you and me) 
> when staring at perf output need to interpret the result.
> 
> If you don't know WTF the thing actually measured, how are you going to 
> do that?

That's really a red herring: there's absolutely no reason why the 
kernel could not pass back the level of precision it provided.

You are also over-rating the importance of such details - most developers 
will assume when looking at profiler output that it's a statistical result 
- and being happy when it happens to be "absolutely accurate" instead of 
just "very accurate"...

> > > I see such a feature only causing confusion; I told it to be 
> > > precise, therefore this register op after the memory load really is 
> > > the more expensive thing.
> > 
> > You are creating confusion where there's none: "give me the best 
> > profiling you've got" is a pretty reasonable thing to ask.
> 
> Only if it then tells you what you got. It doesn't do that.

I'm not against the kernel telling what precision it gave us, at all. That 
could be solved by the kernel setting the precision field in the 
PERF_COUNT_CYCLES_PRECISE case or so.

I'm against you apparently recommending a complex probing method to get 
something the kernel ought to get us straight away via much simpler 
ways...

Complexity does not primarily result in people doing things 'smarter'. It 
primarily results in people _not using the feature at all_.

> > The thing is, there's variations in the quality of profiling between 
> > CPUs, sometimes even between CPU models. 99.999% of the people don't 
> > care about that, because 99.9% of the time the profile is unambiguous: 
> > functions are typically big enough, with the overhead somewhere in the 
> > middle, so skid just doesn't matter.
> 
> Sure at function level it doesn't matter, but once you found your most 
> expensive function very often the next question is _why_ is it 
> expensive.
> 
> At that point you're going to stare at asm output. The moment you do 
> that you need to know the type of output you're staring at.

FYI, very few developers are actually looking at the assembly output 
because very few developers _know_ assembly to begin with.

They are looking at things like sysprof or perf report output, maybe at 
the annotated _source_ code and that's it.

The mapping to source code is fuzzy to begin with, with inlining, loop 
unrolling and other compiler optimizations being a far bigger effect than 
skid.

So the fuzz created by skid is relatively small - but it's nice when it's 
gone and obviously it's helpful when you are looking at assembly output.

> Also, if you think function level output is the most relevant one, you 
> shouldn't use PEBS at all. PEBS has an issue with REP prefixes, it 
> severely under accounts the cycles spend on them. And since exact 
> placement doesn't matter (as you just argued) the little skid you have 
> is irrelevant.
> 
> So either skid matters and you need to know what type of output you've 
> got, or it doesn't and the whole precise thing is irrelevant at best.

That's just another plain silly argument: having more precise results is 
obviously useful even if you don't use a magnifying lense. Sometimes 
functions are small and skid results in the wrong function being credited 
with overhead.

It's also immaterial: there's no reason why the kernel couldn't feed back 
the level of precision it offers, to user-space, via a small, simple 
variation to the existing syscall interface.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/