linux-kernel - Re: [PATCH] perf, nmi: fix unknown NMI warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140216183850.GD32005@two.firstfloor.org>
Date:	Sun, 16 Feb 2014 19:38:50 +0100
From:	Andi Kleen <andi@...stfloor.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Andi Kleen <andi@...stfloor.org>, mingo@...nel.org,
	eranian@...gle.com, linux-kernel@...r.kernel.org,
	Markus Metzger <markus.t.metzger@...el.com>,
	Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH] perf, nmi: fix unknown NMI warning

> This reminds me of the late-ack stuff;
> 
> The way I understand interrupts to work is that when you raise the
> interrupt it gets latched, when you ACK you drop the latch. Then when it
> gets re-raised while its still in progress, it gets latched again and
> the irq-enable at the end of the running handler will get it to trigger
> again.
> 
> So by late-ACK-ing the PMI we can miss PMIs that happen between enabling
> the PMU and ACKing the PMI.

My understanding is that all these things are different latches/states, like
semaphores in a queue. pending-state, not-acked-state, interrupts disabled
state. There's also some delay in propagating between the states, which
was the reason we needed the late-ack in the first place.

Your argument relies on (1) and (2) being the same physical latch,
right?

The late-ack method was originally blessed by the hardware architects.

Also I don't think it would matter in any case because:

> 
> We should either re-check the overflow mask after the ACK or do the ACK
> while the PMU is disabled.

For PMU that would be just a back-to-back PMI. We filter those
out anyways.

And if we're in a state that PMIs get re-raised quickly, we should either
regulate the period down or start throttling.

Plus also if we even get it wrong occassionally sampling is just
statistics and the law of the large numbers wins in the end.

BTW low period sampling is really a perf oddity, most other
profilers don't do that for hw sampling, as it just causes a lot of overhead
and doesn't give better sampling results.

Currently the adaptive period algorithm unfortunately has a tendency
to go very low occasionally before recovery, but I always considered that
a bug. 

I suspect the only case where low period makes sense is s/w tracepoints.

-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/