[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250219160930.GF337534@yaz-khff2.amd.com>
Date: Wed, 19 Feb 2025 11:09:30 -0500
From: Yazen Ghannam <yazen.ghannam@....com>
To: "Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>
Cc: "x86@...nel.org" <x86@...nel.org>, "Luck, Tony" <tony.luck@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"Smita.KoralahalliChannabasappa@....com" <Smita.KoralahalliChannabasappa@....com>
Subject: Re: [PATCH v2 13/16] x86/mce: Unify AMD DFR handler with MCA Polling
On Tue, Feb 18, 2025 at 07:37:18AM +0000, Zhuo, Qiuxu wrote:
> > From: Yazen Ghannam <yazen.ghannam@....com>
> > [...]
> > +static bool smca_should_log_poll_error(enum mcp_flags flags, struct
> > +mce_hw_err *err) {
> > + struct mce *m = &err->m;
> > +
> > + /*
> > + * If this is a deferred error found in MCA_STATUS, then clear
> > + * the redundant data from the MCA_DESTAT register.
> > + */
> > + if (m->status & MCI_STATUS_VAL) {
> > + if (m->status & MCI_STATUS_DEFERRED)
> > + mce_wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(m-
> > >bank), 0);
> > +
> > + return true;
> > + }
> > +
> > + /*
> > + * If the MCA_DESTAT register has valid data, then use
> > + * it as the status register.
> > + */
> > + m->status = mce_rdmsrl(MSR_AMD64_SMCA_MCx_DESTAT(m-
> > >bank));
> > +
> > + if (!(m->status & MCI_STATUS_VAL))
> > + return false;
> > +
> > + /*
> > + * Gather all relevant data now and log the record before clearing
> > + * the deferred status register. This avoids needing to go back to
> > + * the polling function for these actions.
> > + */
> > + mce_read_aux(err, m->bank);
> > +
> > + if (m->status & MCI_STATUS_ADDRV)
> > + m->addr =
> > mce_rdmsrl(MSR_AMD64_SMCA_MCx_DEADDR(m->bank));
> > +
> > + smca_extract_err_addr(m);
> > + m->severity = mce_severity(m, NULL, NULL, false);
> > +
>
> Is the following check in machine_check_poll() needed before
> queuing/logging AMD's deferred error?
>
> if (mca_cfg.dont_log_ce && !mce_usable_address(m))
> //Just clear MCA_STATUS, but not queue/log errors.
>
Good question. Deferred errors are uncorrectable errors that don't need
immediate action. They are not correctable errors, so the 'dont_log_ce'
flag shouldn't apply.
Thanks,
Yazen
Powered by blists - more mailing lists