linux-kernel - Re: [PATCH] perf: fix interrupt handler timing harness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABPqkBRdK5WwokEWE3tQZiAyO3pWbS9aUn7HUkQT+XsMYfJUiA@mail.gmail.com>
Date:	Mon, 8 Jul 2013 22:20:21 +0200
From:	Stephane Eranian <eranian@...gle.com>
To:	Dave Hansen <dave.hansen@...el.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"mingo@...e.hu" <mingo@...e.hu>, dave.hansen@...ux.intel.com,
	"ak@...ux.intel.com" <ak@...ux.intel.com>,
	Jiri Olsa <jolsa@...hat.com>
Subject: Re: [PATCH] perf: fix interrupt handler timing harness

On Mon, Jul 8, 2013 at 10:05 PM, Dave Hansen <dave.hansen@...el.com> wrote:
> On 07/08/2013 11:08 AM, Stephane Eranian wrote:
>> I admit I have some issues with your patch and what it is trying to avoid.
>> There is already interrupt throttling. Your code seems to address latency
>> issues on the handler rather than rate issues. Yet to mitigate the latency
>> it is modify the throttling.
>
> If we have too many interrupts, we need to drop the rate (existing
> throttling).
>
> If the interrupts _consistently_ take too long individually they can
> starve out all the other CPU users.  I saw no way to make them finish
> faster, so the only recourse is to also drop the rate.
>
I think we need to investigate why some interrupts take so much time.
Could be HW, could be SW. Not talking about old hardware here.
Once we understand this, then we know maybe adjust the timing on
our patch.

>> For some unknown reasons, my HSW interrupt handler goes crazy for
>> a while running a very simple:
>>    $ perf record -e cycles branchy_loop
>>
>> And I do see in the log:
>> perf samples too long (2546 > 2500), lowering
>> kernel.perf_event_max_sample_rate to 50000
>>
>> Which is an enormous latency. I instrumented the code, and under
>> normal conditions the latency
>> of the handler for this perf run, is about 500ns and it is consistent
>> with what I see on SNB.
>
> I was seeing latencies near 1 second from time to time, but
> _consistently_ in the hundreds of milliseconds.

On my systems, I see 500ns with one session running. But on HSW,
something else is going on with bursts at 2500ns. That's not normal.
I want an explanation for this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/