linux-kernel - RE: [PATCH v2 1/2] x86/mce/AMD: Redo use of SMCA MCA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN6PR1201MB01317CEF66FCA08CF2843663F80C0@BN6PR1201MB0131.namprd12.prod.outlook.com>
Date:   Fri, 7 Apr 2017 20:37:03 +0000
From:   "Ghannam, Yazen" <Yazen.Ghannam@....com>
To:     Borislav Petkov <bp@...en8.de>
CC:     "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        Tony Luck <tony.luck@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 1/2] x86/mce/AMD: Redo use of SMCA MCA_DE{STAT,ADDR}
 registers

> -----Original Message-----
> From: Borislav Petkov [mailto:bp@...en8.de]
> Sent: Wednesday, April 05, 2017 4:05 PM
> To: Ghannam, Yazen <Yazen.Ghannam@....com>
> Cc: linux-edac@...r.kernel.org; Tony Luck <tony.luck@...el.com>;
> x86@...nel.org; linux-kernel@...r.kernel.org
> Subject: Re: [PATCH v2 1/2] x86/mce/AMD: Redo use of SMCA
> MCA_DE{STAT,ADDR} registers
> 
> On Wed, Apr 05, 2017 at 07:29:48PM +0000, Ghannam, Yazen wrote:
> > If it's set, then I expect a Deferred error in MCA_STATUS since any
> > Correctable Errors will be overwritten. Multiple bank types can
> > generate Deferred errors so there may also be cases where for some
> > types a valid Uncorrectable error happens and overwrites the Deferred
> > error before we can handle it. In which case we lose the Deferred error if
> we don't check MCA_DESTAT.
> 
> So if we have an UE, wouldn't that raise an #MC? I guess in such cases we
> should concentrate only on the deferred errors and let the #MC handler deal
> with them. As we do now.
> 

Right.

> > If it's not set, then it's possible to have a valid Correctable error
> > in MCA_STATUS while the valid Deferred error is in MCA_DESTAT.
> 
> What's logging the CE? We probably should log it too before something
> overwrites it.
> 

CEs will get picked up by polling or if we hit a threshold.

> Anyway, ok, I think I know what needs to happen now:
> 
> amd_deferred_error_interrupt:
> 
> 	if (__log_error_deferred(bank))
> 		return;
> 
> This one read MC?_STATUS and does the logging for when the deferred error
> is in the normal MSRs. It returns true if it succeeded. It reads and hands down
> both MC?_STATUS and MC?_ADDR to __log_error() so that it doesn't have to
> read MC?_STATUS twice.
> 

Okay. But do we need a return value? I'm thinking we can go through all banks
and log any and all Deferred errors rather than just the first one we find. I asked
and this is the preferred method like how we do in the #MC handler. The same
applies to the thresholding interrupt handler.

> If __log_error_deferred() has read a different type of error, we still log it? I'm
> not sure about this. I guess we can ignore that case for now.
> 

Yeah, we should ignore other error types. They'll get picked up by other methods.

> Then:
> 
> 	if (mca_flags.smca)
> 		__log_error_deferred_smca(bank));
> 
> which handles the SMCA case. It too reads MSR_AMD64_SMCA_MCx_DESTAT
> and MSR_AMD64_SMCA_MCx_DEADDR and hands them down to
> __log_error() for logging.
> 

Can this just be included in the __log_error_deferred()? I'm thinking something
similar to one of your earlier suggestions. Like this:

static void
__log_error_deferred(unsigned int bank)
{
        u64 status, addr;
        bool logged = false;

        rdmsrl(msr_ops.status, status);

        if (is_deferred_error(status)) {
                rdmsrl(msr_ops.addr, addr);

                __log_error(bank, status, addr, 0);

                wrmsrl(msr_ops.status, 0);

                logged = true;
        }

        if (!mca_flags.smca)
                return;

        if (logged) {
                wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT, 0);
                return;
        }

        rdmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), status);

        if (is_deferred_error(status)) {
                rdmsrl(MSR_AMD64_SMCA_MCx_DEADDR, addr);

                __log_error(bank, status, addr, 0);

                wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT, 0);
        }
}

> For the __log_error() call in amd_threshold_interrupt(), you define a
> log_error() wrapper which reads the default MSRs and hands them down to
> __log_error().
> 

Okay.

> So __log_error() always gets STATUS and ADDR MSR values and it doesn't
> need to read them from the MSRs but only log them.
> 

Okay.

Thanks,
Yazen