[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FDADB74.3060701@linux.intel.com>
Date: Fri, 15 Jun 2012 14:51:32 +0800
From: Chen Gong <gong.chen@...ux.intel.com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: tony.luck@...el.com, borislav.petkov@....com, x86@...nel.org,
peterz@...radead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] tmp patch to fix hotplug issue in CMCI storm
δΊ 2012/6/14 22:07, Thomas Gleixner ει:
> On Thu, 14 Jun 2012, Chen Gong wrote:
>> this patch is based on tip tree and previous 5 patches.
>
> You really don't need all this complexity to handle that. The main
> thing is that you clear the storm state and adjust the storm counter
> when the cpu goes offline (in case the state is ACTIVE).
>
> When it comes online again then you can simply let it restart cmci. If
> the storm on this cpu (or node) still exists then it will notice and
> everything falls in place.
I ever tested some different scenarios, if storm on this cpu still
exists, it triggers the CMCI and broadcast it on the sibling CPU,
which means the counter *cmci_storm_on_cpus* will increase beyond
the upper limit. E.g. on a 2 sockets SandyBridge-EP system (one socket
has 8 cores and 16 threads), inject one error on one socket, you can
watch *cmci_storm_on_cpus* = 16 becuase of CMCI broadcast, during
this time, offline and online one CPU on this socket, firstly
*cmci_storm_on_cpus* = 15 because of offline and ACTIVE status, and then
*cmci_storm_on_cpus* = 31 in that CMCI is actived because of
online.That's why I have to disable CMCI during whole online/offline
until CMCI storm is subsided. Frankly, the logic is a little bit
complex so that I write many comments to avoid I forget it after some
time :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists