linux-kernel - Re: re-enable Nehalem raw Offcore-Events support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110429185741.GB10217@elte.hu>
Date:	Fri, 29 Apr 2011 20:57:41 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Vince Weaver <vweaver1@...s.utk.edu>
Cc:	torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Stephane Eranian <eranian@...il.com>,
	Andi Kleen <ak@...ux.intel.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: re-enable Nehalem raw Offcore-Events support


* Vince Weaver <vweaver1@...s.utk.edu> wrote:

> On Fri, 29 Apr 2011, Ingo Molnar wrote:
> 
> > Firstly, one technical problem i have with the raw events ABI method is that it 
> > was added in commit e994d7d23a0b ("perf: Fix LLC-* events on Intel 
> > Nehalem/Westmere"). The raw ABI bit was done 'under the radar', it was not the 
> > declared title of the commit, it was not declared in the changelog either and 
> > it was not my intention to offer such an ABI prematurely either - and i noticed 
> > those two lines too late - but still in time to not let this slip into v2.6.39.
> 
> The initial patches from November seem to make it clear what is being done 
> here.  I thought it was pretty obvious to those reviewing those patches what 
> was involved.  How would I have known that OFFCORE_RESPONSE support was 
> coming if I didn't see the patches obviously float by on linux-kernel?

Not really, Peter did a lot of review of those patches and they were changed 
beyond recognition from their original form - i think Peter wrote a fair 
portion of the supporting cleanups, as Andi seemed desinterested in acting 
quickly on review feedback.

> > Thirdly, and this is my most fundamental objection, i also object to the 
> > timing of this offcore raw access ABI, because past experience is that we 
> > *really* do not want to allow raw PMU details without *first* having 
> > generic abstractions and generic events first.
> 
> why?  Can you explain this better?

Didn't i do that in the rest of my reply? You even quote some of it below.

> > The thing is, as far as i can see you and Andi are *still* pushing the 
> > failed perfmon and Oprofile ABI and tooling models.
> 
> what ABI? 

Well, the raw events ABI reminds me of the perfmon2/perfmon3 ABI: get the raw 
PMU to user-space as quickly as possible and leave all the details to 
user-space. I do not agree with that model of exposing performance measurement 
hardware features.

> [...] by the way, I hate oprofile and never use it.

I dont 'hate' oprofile per se (hey, i still keep pulling and pushing oprofile 
bits from Robert), i just find it very unintuitive and cumbersome to use, and i 
think it was misdesigned in several ways.

> perfmon2 and perfctr are very similar to perf_events in that they provide 
> lightly massaged access to the MSRs so you can program whatever raw event 
> that you like.

perf events (the kernel side) has a very, very different design from perfmon2 
and perfctr - but judging by your past replies such design aspects you do not 
seem to recognize, let alone appreciate.

> It's true that the *userspace* tools (pfmon, iperfex, PAPI) handle things 
> differently than perf, but that's a *userspace* API, not a kernel ABI.  You 
> seem to keep confusing this.

No, i do not think i am confused, i just disagree with you.

> > We put structure, proper abstractions and easy tooling *ahead* of the 
> > interests of a small group of people who'd rather prefer a lowlevel, opaque 
> > hardware channel so that they do not have to *think* about generalization 
> > and also perhaps so they do not have to share their selection of events and 
> > analysis methods with others ...
> 
> And generalization across platforms (and even across minor chip revisions) 
> *doesn't work*.

Why not? We cannot generalize everything, but generalizing the major CPU 
concepts works quite well for perf. The thing is, the laws of physics are the 
same for all CPUs so they all seem to employ very similar concepts and measure 
those concepts in similar ways, with similar events.

But it's more than that, generalization works even on the *hardware* level:

AMD managed to keep a large chunk of their events stable even across very 
radical changes of the underlying hardware. I have two AMD systems produced 
*10* years apart and they even use the same event encodings for the major 
events.

Intel started introducing stable event definitions a couple of years ago as 
well.

So i think i can tell it with a fairly high confidence factor that you simply 
do not know what you are talking about.

> [...]  It lasted maybe a year in PAPI before it was realized to be 
> unworkable.  Talk to some people from AMD or Intel if you want.  It's not 
> possible to sanely generalize perf counters.  They are too tied to hardware 
> quirks.

I have the exact opposite experience: chip designers we talked to were clearly 
supportive of the generalizations perf events offers and clearly both AMD and 
Intel chips are moving *towards* more stable, more generic and more flexible 
performance event measurement methods.

We are getting more counters and with less constraints. Even the hardware is 
slowly but surely abstracting things out.

It is in the interest of PMU designers as well that their stuff moves one level 
higher within OSs and does not stay at the weird hardware-specific level. 
Hardware is getting more complex, measuring it becomes more complex, so making 
things more generic certainly helps.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/