linux-kernel - RE: [PATCH] x86: auto poll/interrupt mode switch for CMC to stop CMC storm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.02.1205242013490.3231@ionos>
Date:	Thu, 24 May 2012 20:18:07 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	"Luck, Tony" <tony.luck@...el.com>
cc:	Chen Gong <gong.chen@...ux.intel.com>,
	"bp@...64.org" <bp@...64.org>, "x86@...nel.org" <x86@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: RE: [PATCH] x86: auto poll/interrupt mode switch for CMC to stop
 CMC storm

On Thu, 24 May 2012, Luck, Tony wrote:

> > So can you please explain how this is better than having this strict
> > per cpu and avoid all the mess which comes with that patch? The
> > approach of letting global state be modified in a random manner is
> > just doomed.
> 
> Well doomed sounds bad :-) ... and I think I now agree that we should
> get rid of global state and have polling vs. CMCI mode be per-cpu. It
> means that it will take fractionally longer to react to a storm, but
> on the plus side we'll naturally set storm mode on just the cpus
> that are seeing it on a multi-socket system without having to check
> topology data ... which should be better for the case where a noisy
> source of CMCI is plaguing one socket, while other sockets have some
> much lower rate of CMCI that we'd still like to log.

I thought more about it - see my patch. So I have a global state now
as well, but it's only making sure that stuff stays in poll mode as
long as others are in poll mode. That's good I think as you avoid the
following:

cmcis which affect siblings or a socket are delivered to all affected
cores, but only one core might see the bank. So all others would
reenable fast and then switch back to polling because the storm still
persists. This would ping pong so, we probably want to avoid it.

Ideally the storm_on_cpus variable should be per socket and not system
wide, but we can do that when it really becomes an issue.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/