lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 04 Jan 2012 22:24:58 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Stephane Eranian <eranian@...gle.com>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, gleb@...hat.com,
	asharma@...com, vince@...ter.net, wcohen@...hat.com
Subject: Re: perf_events: proposed fix for broken intr throttling (repost)

On Wed, 2012-01-04 at 15:39 +0100, Stephane Eranian wrote:

> In running some tests with 3.2.0-rc7-tip, I noticed unexpected throttling
> notification samples. I was using fixed period with a long enough period
> that I could not possibly hit the default limit of 100000 samples/sec/cpu.
> 
> I investigated the matter and discovered that the following commit
> is the culprit:
> 
> commit 0f5a2601284237e2ba089389fd75d67f77626cef
> Author: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> Date:   Wed Nov 16 14:38:16 2011 +0100
> 
>     perf: Avoid a useless pmu_disable() in the perf-tick
> 
> 
> The throttling mechanism REQUIRES that the hwc->interrupt counter be reset
> at EACH timer tick. This is regardless of the fact that the counter is in fixed
> period or frequency mode. The optimization introduced in this patch breaks this
> by avoiding calling perf_ctx_adjust_freq() at each timer tick. For events with
> fixed period, it would not adjust any period at all BUT it would reset the
> throttling counter.
> 
> Given the way the throttling mechanism is implemented we cannot avoid doing
> some work at each timer tick. Otherwise we loose many samples for no good
> reasons.
> 
> One may also question the motivation behind checking the interrupt rate at
> each timer tick rather than every second, i.e., average it out over a longer
> period.

That also allows your system to be dead for longer..

> I see two solutions short term:
>    1 - revert the commit above
>    2 - special case the situation with no frequency-based sampling event
> 
> I have implemented solution 2 with the draft fix below. It does not invoke
> perf_pmu_enable()/perf_pmu_disable().  I am not clear on whether or not this
> is really needed in this case. Please advise.

I don't think it needs that, I do dislike the unconditional iterate all
events thing though. Maybe we can set some per-cpu state indicating
someone got throttled (rare under normal operation -- you'd hope) and
only iterate to unthrottle when we find this set.

I think the event scheduling resulting from migration will already
re-enable the event, avoiding the loss of unthrottle due to that..
although it would be good to verify that.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ