linux-kernel - Re: [RFC] perf_events: how to add Intel LBR support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1266848978.6122.195.camel@laptop>
Date:	Mon, 22 Feb 2010 15:29:38 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Stephane Eranian <eranian@...gle.com>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, paulus@...ba.org,
	davem@...emloft.net, fweisbec@...il.com, robert.richter@....com,
	perfmon2-devel@...ts.sf.net, eranian@...il.com
Subject: Re: [RFC] perf_events: how to add Intel LBR support

On Mon, 2010-02-22 at 15:07 +0100, Stephane Eranian wrote:
> On Thu, Feb 18, 2010 at 11:25 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> > On Sun, 2010-02-14 at 11:12 +0100, Peter Zijlstra wrote:
> >>
> >> Dealing with context switches is also going to be tricky, where we have
> >> to safe and 'restore' LBR stacks for per-task counters.
> >
> > OK, so I poked at the LBR hardware a bit, sadly the TOS really doesn't
> > count beyond the few bits it requires :-(
> >
> 
> The TOS is also a read-only MSR.

well, r/o is fine.

> > I had hopes it would, since that would make it easier to share the LBR,
> > simply take a TOS snapshot when you schedule the counter in, and never
> > roll back further for that particular counter.
> >
> > As it stands we'll have to wipe the full LBR state every time we 'touch'
> > it, which makes it less useful for cpu-bound counters.
> >
> Yes, you need to clean it up each time you snapshot it and each time
> you restore it.
> 
> The patch does not seem to handle LBR context switches.

Well, it does, but sadly not in a viable way, it assumes the TOS counts
more than the required bits and stops the unwind on hwc->lbr_tos
snapshot. Except that the TOS doesn't work that way.

This whole PEBS/LBR stuff is a massive trainwreck from a design pov.

> > Also, not all hw (core and pentium-m) supports the freeze_lbrs_on_pmi
> > bit, what we could do for those is stick an unconditional LBR disable
> > very early in the NMI path and simply roll back the stack until we hit a
> > branch into the NMI vector, that should leave a few usable LBR entries.
> >
> You need to be consistent across the CPUs. If a CPU does not provide
> freeze_on_pmi, then I would simply not support it as a first approach.
> Same thing if the LBR is less than 4-deep. I don't think you'll get anything
> useful out of it.

Well, if at the first branch into the NMI handler you do an
unconditional LBR disable, you should still have 3 usable records. But
yeah, the 1 deep LBR chips (p6 and amd) are pretty useless for this
purpose and are indeed not supported.

> The patch does not address the configuration options available on Intel
> Nehalem/Westmere, i.e., LBR_SELECT (see Vol 3a table 16-9). We can
> handle priv level separately as it can be derived from the event exclude_*.
> But it you want to allow multiple events in a group to use PERF_SAMPLE_LBR
> then you need to ensure LBR_SELECT is set to the same value, priv levels
> included.

Yes, I explicitly skipped that because of the HT thing and because like
I argued in an earlier reply, I don't see much use for it, that is, it
significantly complicates matters for not much (if any) benefit.

As it stands LBR seems much more like a hw-breakpoint feature than a PMU
feature, except for this trainwreck called PEBS.

> Furthermore, LBR_SELECT is shared between HT threads. We need to either
> add another field in perf_event_attr or encode this in the config
> field, though it
> is ugly because unrelated to the event but rather to the sample_type.
> 
> The patch is missing the sampling part, i.e., dump of the LBR (in sequential
> order) into the sampling buffer.

Yes, I just hacked enough stuff together to poke at the hardware a bit,
never said it was anywhere near complete.

> I would also select a better name than PERF_SAMPLE_LBR. LBR is an
> Intel thing. Maybe PERF_SAMPLE_TAKEN_BRANCH.

Either LAST_BRANCH (suggesting a single entry), or BRANCH_STACK
(suggesting >1 possible entries) seem more appropriate.

Supporting only a single entry, LAST_BRANCH, seems like an attractive
enough option, the use of multiple steps back seem rather pointless for
interpreting the sample.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/