[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170405200430.aagtouutiqyopp7a@pd.tnic>
Date: Wed, 5 Apr 2017 22:04:30 +0200
From: Borislav Petkov <bp@...en8.de>
To: "Ghannam, Yazen" <Yazen.Ghannam@....com>
Cc: "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
Tony Luck <tony.luck@...el.com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 1/2] x86/mce/AMD: Redo use of SMCA MCA_DE{STAT,ADDR}
registers
On Wed, Apr 05, 2017 at 07:29:48PM +0000, Ghannam, Yazen wrote:
> If it's set, then I expect a Deferred error in MCA_STATUS since any Correctable
> Errors will be overwritten. Multiple bank types can generate Deferred errors
> so there may also be cases where for some types a valid Uncorrectable error
> happens and overwrites the Deferred error before we can handle it. In which
> case we lose the Deferred error if we don't check MCA_DESTAT.
So if we have an UE, wouldn't that raise an #MC? I guess in such cases
we should concentrate only on the deferred errors and let the #MC
handler deal with them. As we do now.
> If it's not set, then it's possible to have a valid Correctable error in MCA_STATUS
> while the valid Deferred error is in MCA_DESTAT.
What's logging the CE? We probably should log it too before something
overwrites it.
Anyway, ok, I think I know what needs to happen now:
amd_deferred_error_interrupt:
if (__log_error_deferred(bank))
return;
This one read MC?_STATUS and does the logging for when the deferred
error is in the normal MSRs. It returns true if it succeeded. It reads
and hands down both MC?_STATUS and MC?_ADDR to __log_error() so that it
doesn't have to read MC?_STATUS twice.
If __log_error_deferred() has read a different type of error, we still
log it? I'm not sure about this. I guess we can ignore that case for
now.
Then:
if (mca_flags.smca)
__log_error_deferred_smca(bank));
which handles the SMCA case. It too reads MSR_AMD64_SMCA_MCx_DESTAT
and MSR_AMD64_SMCA_MCx_DEADDR and hands them down to __log_error() for
logging.
For the __log_error() call in amd_threshold_interrupt(), you define a
log_error() wrapper which reads the default MSRs and hands them down to
__log_error().
So __log_error() always gets STATUS and ADDR MSR values and it doesn't
need to read them from the MSRs but only log them.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Powered by blists - more mailing lists