[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGtprH9SpjSnR-u-AH+t6BB+0pzHbgLTUv0pu+dkYR=ZzEYicA@mail.gmail.com>
Date: Wed, 30 Jul 2025 07:20:04 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: Adrian Hunter <adrian.hunter@...el.com>
Cc: Dave Hansen <dave.hansen@...el.com>, "Luck, Tony" <tony.luck@...el.com>,
Borislav Petkov <bp@...en8.de>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>, "x86@...nel.org" <x86@...nel.org>,
H Peter Anvin <hpa@...or.com>, "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>, "Huang, Kai" <kai.huang@...el.com>,
"Chatre, Reinette" <reinette.chatre@...el.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>,
"tony.lindgren@...ux.intel.com" <tony.lindgren@...ux.intel.com>,
"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
"Zhao, Yan Y" <yan.y.zhao@...el.com>, "Gao, Chao" <chao.gao@...el.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>, "seanjc@...gle.com" <seanjc@...gle.com>
Subject: Re: [PATCH 1/2] x86/mce: Fix missing address mask in recovery for
errors in TDX/SEAM non-root mode
On Wed, Jul 30, 2025 at 3:55 AM Adrian Hunter <adrian.hunter@...el.com> wrote:
>
> On 27/06/2025 19:33, Dave Hansen wrote:
> > On 6/27/25 09:24, Luck, Tony wrote:
> >> We've been sending a combined key+address in the "mce->addr" to
> >> user space for a while. Has anyone built infrastructure on top of that?
> >
> > I'm not sure they can do anything useful with an address that has the
> > KeyID in the first place. The partitioning scheme is in an MSR, so
> > they'd need to be doing silly gymnastics to even decode the address.
> >
> > Userspace can deal with the KeyID not being in the address. It's been
> > the default for ages. So, if we take it back out, I'd expect it fixes
> > more things than it breaks.
> >
> > So, yeah, we should carefully consider it. But it still 100% looks like
> > the right thing to me to detangle the KeyID and physical address in the ABI.
>
> Coming back to this after a bit of a break.
>
> It feels unlikely to me that any users are expecting KeyID in mce->addr.
>
> Looking at user space programs like mcelog and rasdaemon, gives the
> impression that mce->addr contains only an address.
>
> The UAPI header file describes addr as "Bank's MCi_ADDR MSR", but what
> mce_read_aux() does tends to contradict that, especially for AMD
> SMCA.
>
> But there are also additional places where it seems like MCI_ADDR_PHYSADDR
> is missing:
>
> tdx_dump_mce_info()
> paddr_is_tdx_private()
> __seamcall_ret(TDH_PHYMEM_PAGE_RDMD, &args)
> TDH_PHYMEM_PAGE_RDMD expects KeyID bits to be zero
>
> skx_mce_output_error()
> edac_mc_handle_error()
> expects page_frame_number, so without KeyID
>
> The KeyID is probably only useful for potentially identifying the TD, but
> given that the TD incurs a FATAL error, that may be obvious anyway.
>
> So removing the KeyID from mce->addr looks like the right thing to do.
>
> Note AFAICT there are 3 kernel APIs that deal with the MCE address:
>
> Device /dev/mcelog which outputs struct mce
> Tracepoint mce:mce_record which outputs members from struct mce
> Tracepoint ras:mc_event where the kernel constructs the address
> from page_frame_number implying that KeyID should not be present
>
> I guess it would be sensible to ask what customers think.
>
> Vishal, do you know anyone at Google who deals with handling machine
> check information, and who might have an opinion on this?
>
I think it's safe to assume Google hasn't built any infra in the
userspace that needs KeyID bits in the mce address. That being said,
Dave's suggestion to "detangle the KeyID and physical address in the
ABI" makes sense to me.
Powered by blists - more mailing lists