[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F54D4AF.9060802@redhat.com>
Date: Mon, 05 Mar 2012 11:58:55 -0300
From: Mauro Carvalho Chehab <mchehab@...hat.com>
To: Borislav Petkov <bp@...64.org>
CC: Tony Luck <tony.luck@...el.com>, Ingo Molnar <mingo@...e.hu>,
EDAC devel <linux-edac@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 4/4] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
Em 05-03-2012 11:13, Borislav Petkov escreveu:
> On Mon, Mar 05, 2012 at 10:35:47AM -0300, Mauro Carvalho Chehab wrote:
>> No. This is an example that you're not reading my emails:
>
> Unfortunately, I read your emails.
>
>> no other driver needs that. So, it is something that it is specific to
>> the MCA amd64 drivers.
>
> Let me spell it for ya: no, it's specific to x86, and not to amd64_edac.
As I'll NACK adding this solution on my drivers, as it makes no sense there,
it is specific to amd64_edac/amd64 mce.
>> The other two MCA drivers are sb_edac and i7core_edac. I wrote both drivers, and they
>> don't need any helper function to store strings on a temporary buffer.
>>
>> Also, the edac core is not x86-specific. So, referencing to a var there (ras_agent)
>> that it is defined inside arch/x86 would break Kernel compilation on all other
>> architectures.
>
> That's more like it.
>
> It can be moved to an arch-agnostic place or be defined
> __attribute__((weak)) in edac_core.c. Unless someone has a better idea,
> of course.
Well, just fill the string on the way it makes sense for amd64, and then call the
EDAC report function, letting it to call the trace function.
>
> [..]
>
>> As already pointed out, you're not reading my emails. The above were at the version 1 of
>> my patches, with I sent at least a month ago. Since version 2, what is proposed is to use:
>>
>> TRACE_EVENT(mc_error_mce,
>>
>> for MCA-based memory error events. There's also a variant for non-MCA drivers (mc_error).
>>
>> [1] http://git.kernel.org/?p=linux/kernel/git/mchehab/linux-edac.git;a=commitdiff;h=4eb2a29419c1fefd76c8dbcd308b84a4b52faf4d
>
> I see at least 4 misdesigned tracepoints there:
>
> trace_mc_out_of_range_mce
> trace_mc_out_of_range
> trace_mc_error_mce
> trace_mc_error
> ...
There's no "..." there. There are just 4 traces defined.
The out of range is an special case to report parse errors.
As I said before, I'm OK to remove the *out_of_range* traces.
So, there'are just two traces:
trace_mc_error_mce
trace_mc_error
E. g. one for the MCA errors, and another one for the non-architecture supported
error handling.
> so NACK to those.
>
>> I also wrote on my emails that, instead of having a tracepoint
>> specific for memory errors, it is possible to re-define the fields
>> I've proposed to cover CPU location/socket label, and that this is
>> better than folding everything into a hard-to-parse single string
>> message.
>
> No, this is repurposing the fields of memory errors, which is ugly. So, no.
Then, I it should have 2 MCA error traces:
- One when the error is inside the CPU socket;
- Another one when the error is outside the CPU.
Tony,
Please correct me if I'm wrong, but Intel MCA can only point to an error inside
the CPU or a memory error, right? At least, I didn't find there at the x86 arch
specs anything at the MCA registers that would allow an error to point to the
PCI bus address for a PCI error, for example.
Regards,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists