[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FC88E18.2040107@linux.intel.com>
Date: Fri, 01 Jun 2012 17:40:40 +0800
From: Chen Gong <gong.chen@...ux.intel.com>
To: Borislav Petkov <bp@...64.org>
CC: "Luck, Tony" <tony.luck@...el.com>,
Steven Rostedt <rostedt@...dmis.org>,
Mauro Carvalho Chehab <mchehab@...hat.com>,
Linux Edac Mailing List <linux-edac@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Aristeu Rozanski <arozansk@...hat.com>,
Doug Thompson <norsk5@...oo.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH] RAS: Add a tracepoint for reporting memory controller
events
δΊ 2012/6/1 17:10, Borislav Petkov ει:
> On Thu, May 31, 2012 at 08:52:21PM +0000, Luck, Tony wrote:
>>> It could be very quiet (i.e., machine runs with no errors) and
>>> it could have bursts where it reports a large number of errors
>>> back-to-back depending on access patterns, DIMM health,
>>> temperature, sea level and at least a bunch more factors.
>>
>> Yes - the normal case is a few errors from stray neutrons ...
>> perhaps a few per month, maybe on a very big system a few per
>> hour. When something breaks, especially if it affects a wide
>> range of memory addresses, then you will see a storm of errors.
>
> IOW, when the sh*t hits the fan :-)
>
>>> So I can imagine buffers filling up suddenly and fast, and
>>> userspace having hard time consuming them in a timely manner.
>>
>> But I'm wondering what agent is going to be reporting all these
>> errors. Intel has CMCI - so you can get a storm of interrupts
>> which would each generate a trace record ... but we are working
>> on a patch to turn off CMCI if a storm is detected.
>
> Yeah, about that. What are you guys doing about losing CECCs when
> throttling is on, I'm assuming there's no way around it?
>
This week I'm busy in doing other work so I have no time to do further
debug on Thomas' patch. I will continue to work on in the next days...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists