lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53864517.3040503@intel.com>
Date:	Wed, 28 May 2014 13:20:39 -0700
From:	Dave Hansen <dave.hansen@...el.com>
To:	Stephane Eranian <eranian@...gle.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"mingo@...e.hu" <mingo@...e.hu>,
	"ak@...ux.intel.com" <ak@...ux.intel.com>,
	"Yan, Zheng" <zheng.z.yan@...el.com>
Subject: Re: [RFC] perf/x86: PMU IRQ handler issues

On 05/28/2014 12:48 PM, Stephane Eranian wrote:
> Some days ago, I was alerted that under important network load, something
> is going wrong with perf_event sampling in frequency mode (such as perf top).
> The number of samples was way too low given the cycle count (via perf stat).
> Looking at the syslog, I noticed that the perf irq latency throttler
> had kicked in
> several times. There may have been several reasons for this.
> 
> Maybe the workload had changing phases and the frequency adjustments
> was not working properly and dropping to very small period and then generated
> flood of interrupts.

The problem description here is pretty fuzzy.  Could you give some
actual numbers describing the issues that you're seeing, including the
ftrace that Andi was asking for?  There are also some handy tracepoints
for NMI lengths that I stuck in.

The reason that the throttling code is there is that the CPU can get in
to a state where it is doing *NOTHING* other than processing NMIs (the
biggest of which are the perf-driven ones).  It doesn't start throttling
until 128 samples end up averaging more than the limit.

How large of a system is this, btw?  I had the worst issues on a
160-logical-cpu system.  It was much harder to get it to trouble on
smaller systems.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ