[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.00.0904301522100.12182@pedra.chehab.org>
Date: Thu, 30 Apr 2009 15:37:28 -0300 (BRT)
From: Mauro Carvalho Chehab <mchehab@...radead.org>
To: Andi Kleen <andi@...stfloor.org>
cc: Borislav Petkov <borislav.petkov@....com>,
akpm@...ux-foundation.org, greg@...ah.com, mingo@...e.hu,
tglx@...utronix.de, hpa@...or.com, dougthompson@...ssion.com,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 00/21 v2] amd64_edac: EDAC module for AMD64
On Thu, 30 Apr 2009, Andi Kleen wrote:
>> Kconfig, mce code delivers needed error info to edac which, in turn,
>> goes and decodes the error/does the mapping to DIMM blocks/supplies DRAM
>> error injection facility for testing purposes and similar things. That
>> way you have both and they don't overlap in functionality.
>
> You can do that, but it's redundant because mcelog can do this
> this already. I had some conversations with existing EDAC users
> recently and they seem to only care about the resulting output,
> so just querying from mcelog is fine.
> The only issue is that mcelog needs to get the DIMM data. In many
> cases it can do so from SMBIOS output, if not a suitable interface
> would need to be provided by the kernel.
>From what I've heard from the existing EDAC users, they have several
concerns that mcelog could be viable replacement to their EDAC usage, due
to performance issues, including the need of accessing SMBIOS in order to
get such information.
Also, EDAC interface is already stablished, and, as pointed by Doug, it is
very useful on cluster environments, where memory failures is a big issue
and need to be solved as soon as possible.
EDAC solves this issue very well and works on a wider range of designs
than mcelog. So, there's no reason to deprecate it or to reject patches
adding EDAC interfaces to other chips.
On the other hand, mcelog is also useful on different scenarios. So, they
are not competing technologies, but complementary ones.
So, assuming that both EDAC and mcelog are needed, the proper design for
those chipsets where the memory controller is integrated with other log
functions (like AMD64 and Nethalem) seem to build an unique kernel layer
that retrieves the error logs from the harware and allows access to the
same data via both mcelog and EDAC userspace API's.
Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists