lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABPqkBT3QcKPGUoo2Wuvn0TdK3TKcMWtbuq4jGqbouBvqaKgDg@mail.gmail.com>
Date:	Mon, 8 Jul 2013 20:08:25 +0200
From:	Stephane Eranian <eranian@...gle.com>
To:	Dave Hansen <dave.hansen@...el.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"mingo@...e.hu" <mingo@...e.hu>, dave.hansen@...ux.intel.com,
	"ak@...ux.intel.com" <ak@...ux.intel.com>,
	Jiri Olsa <jolsa@...hat.com>
Subject: Re: [PATCH] perf: fix interrupt handler timing harness

On Mon, Jul 8, 2013 at 4:34 PM, Dave Hansen <dave.hansen@...el.com> wrote:
> On 07/04/2013 03:30 PM, Stephane Eranian wrote:
>> There was an misunderstanding on the API of the do_div()
>> macro. It returns the remainder of the division and this
>> was not what the function expected leading to disabling the
>> interrupt latency watchdog.
>
> "Misunderstanding" is a very kind term for what I did there.  Stephane,
> were you actually running in to the new cpu limit, or were you just
> trying to update kernel.perf_event_max_sample_rate?
>
I am chasing a problem on HSW desktop where your code triggers the throttling.
It happens systematically on my system first time I run perf record usually the
first time.

I admit I have some issues with your patch and what it is trying to avoid.
There is already interrupt throttling. Your code seems to address latency
issues on the handler rather than rate issues. Yet to mitigate the latency
it is modify the throttling.

For some unknown reasons, my HSW interrupt handler goes crazy for
a while running a very simple:
   $ perf record -e cycles branchy_loop

And I do see in the log:
perf samples too long (2546 > 2500), lowering
kernel.perf_event_max_sample_rate to 50000

Which is an enormous latency. I instrumented the code, and under
normal conditions the latency
of the handler for this perf run, is about 500ns and it is consistent
with what I see on SNB.
However, on HSW, it is sometimes 5x longer. I have no explanation for
this. Andi's patch
to delay the NMI ack is enabled but it does not alleviate this
problem. There is something
else going on possibly with the HW, and I don't know what it is. This
is not a one off. It lasts for
several calls because it fires your watchdog which averages out over
multiple calls.
So one cannot blame this on the case where the samping buffer gets
full, for instance.

My model is:
processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x7

So something is not right and I would like to know what it is.
I will keep investigating.

> BTW, I also did the same thing in 2ab00456:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2ab00456ea8a0d79acb1390659b98416111880b2
>
> I'll have a patch out shortly for that one, but its damage would be
> limited to a bogus printk.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ