[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150330134531.GV23123@twins.programming.kicks-ass.net>
Date: Mon, 30 Mar 2015 15:45:31 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Kan Liang <kan.liang@...el.com>
Cc: linux-kernel@...r.kernel.org, mingo@...nel.org, acme@...radead.org,
eranian@...gle.com, andi@...stfloor.org
Subject: Re: [PATCH V5 4/6] perf, x86: handle multiple records in PEBS buffer
On Mon, Feb 23, 2015 at 09:25:54AM -0500, Kan Liang wrote:
> From: Yan, Zheng <zheng.z.yan@...el.com>
>
> When PEBS interrupt threshold is larger than one, the PEBS buffer
> may include multiple records for each PEBS event. This patch makes
> the code first count how many records each PEBS event has, then
> output the samples in batch.
>
> One corner case needs to mention is that the PEBS hardware doesn't
> deal well with collisions, when PEBS events happen near to each
> other. The records for the events can be collapsed into a single
> one, and it's not possible to reconstruct all events that caused
> the PEBS record, However in practice collisions are extremely rare,
> as long as different events are used. The periods are typically very
> large, so any collision is unlikely. When collision happens, we drop
> the PEBS record.
>
> Here are some numbers about collisions.
> Four frequently occurring events
> (cycles:p,instructions:p,branches:p,mem-stores:p) are tested
>
> Test events which are sampled together collision rate
> cycles:p,instructions:p 0.25%
> cycles:p,instructions:p,branches:p 0.30%
> cycles:p,instructions:p,branches:p,mem-stores:p 0.35%
>
> cycles:p,cycles:p 98.52%
>
> collisions are extremely rare as long as different events are used. The
> only way you can get a lot of collision is when you count the same thing
> multiple times. But it is not a useful configuration.
This fails to mention the other problem the status field has. You also
did not specify what exact condition you counted as a collision.
The PEBS status field is a copy of the GLOBAL_STATUS MSR at assist time,
this means that:
- its possible (and harmless) for the status field to contain set bits
for !PEBS events -- the proposed code is buggy here.
- its possible to have multiple PEBS bits set even though the event
really only was for a single event -- if you count everything with
multiple PEBS bits set as a collision you're counting wrong.
So once again, a coherent story here please.
> static void __intel_pmu_pebs_event(struct perf_event *event,
> + struct pt_regs *iregs,
> + void *at, void *top, int count)
> {
> + struct perf_output_handle handle;
> + struct perf_event_header header;
> struct perf_sample_data data;
> struct pt_regs regs;
>
> + if (!intel_pmu_save_and_restart(event) &&
> + !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
> return;
>
> + setup_pebs_sample_data(event, iregs, at, &data, ®s);
>
> + if (perf_event_overflow(event, &data, ®s)) {
> x86_pmu_stop(event, 0);
> + return;
> + }
> +
> + if (count <= 1)
> + return;
> +
> + at += x86_pmu.pebs_record_size;
> + count--;
> +
> + perf_sample_data_init(&data, 0, event->hw.last_period);
> + perf_prepare_sample(&header, &data, event, ®s);
> +
> + if (perf_output_begin(&handle, event, header.size * count))
> + return;
> +
> + for (; at < top; at += x86_pmu.pebs_record_size) {
> + struct pebs_record_nhm *p = at;
> +
> + if (p->status != (1 << event->hw.idx))
> + continue;
> +
> + setup_pebs_sample_data(event, iregs, at, &data, ®s);
> + perf_output_sample(&handle, &header, &data, event);
> +
> + count--;
> + if (count == 0)
> + break;
> + }
> +
> + perf_output_end(&handle);
> }
This can use a comment on why this is funny like this. I have vague
memories, but a comment helps everybody who doesn't have those memories
-- which will include me in a year or so.
What I cannot remember is why we call overflow on the first, not the
last event.
> static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
> {
> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> struct debug_store *ds = cpuc->ds;
> + struct perf_event *event;
> + void *base, *at, *top;
> int bit;
> + int counts[MAX_PEBS_EVENTS] = {};
>
> if (!x86_pmu.pebs_active)
> return;
>
> + base = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
> top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
>
> ds->pebs_index = ds->pebs_buffer_base;
>
> + if (unlikely(base >= top))
> return;
>
> + for (at = base; at < top; at += x86_pmu.pebs_record_size) {
> struct pebs_record_nhm *p = at;
>
> + bit = find_first_bit((unsigned long *)&p->status,
> + x86_pmu.max_pebs_events);
> + if (bit >= x86_pmu.max_pebs_events)
> + continue;
> + /*
> + * The PEBS hardware does not deal well with collisions,
> + * when the same event happens near to each other. The
> + * records for the events can be collapsed into a single
> + * one, and it's not possible to reconstruct all events
> + * that caused the PEBS record. However in practice, the
> + * collisions are extremely rare. If collision happened,
> + * we drop the record. its the safest choice.
> + */
> + if (p->status != (1 << bit))
> + continue;
As per the above, this is buggy. You should start by masking p->status
with x86_pmu.pebs_active to clear all !PEBS counter bits.
> + if (!test_bit(bit, cpuc->active_mask))
> + continue;
> + event = cpuc->events[bit];
> + WARN_ON_ONCE(!event);
> + if (!event->attr.precise_ip)
> + continue;
> + counts[bit]++;
> + }
>
> + for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
> + if (counts[bit] == 0)
> continue;
> + event = cpuc->events[bit];
> + for (at = base; at < top; at += x86_pmu.pebs_record_size) {
> + struct pebs_record_nhm *p = at;
>
> + if (p->status == (1 << bit))
> + break;
> + }
> + __intel_pmu_pebs_event(event, iregs, at, top, counts[bit]);
> }
> }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists