[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e98cf66a-472d-e322-5f7d-01661fd98ab2@intel.com>
Date: Thu, 8 Jul 2021 10:15:24 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Brijesh Singh <brijesh.singh@....com>, x86@...nel.org,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
linux-efi@...r.kernel.org, platform-driver-x86@...r.kernel.org,
linux-coco@...ts.linux.dev, linux-mm@...ck.org,
linux-crypto@...r.kernel.org
Cc: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
Tom Lendacky <thomas.lendacky@....com>,
"H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Sergio Lopez <slp@...hat.com>, Peter Gonda <pgonda@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Dov Murik <dovmurik@...ux.ibm.com>,
Tobin Feldman-Fitzthum <tobin@....com>,
Borislav Petkov <bp@...en8.de>,
Michael Roth <michael.roth@....com>,
Vlastimil Babka <vbabka@...e.cz>, tony.luck@...el.com,
npmccallum@...hat.com, brijesh.ksingh@...il.com
Subject: Re: [PATCH Part2 RFC v4 09/40] x86/fault: Add support to dump RMP
entry on fault
On 7/8/21 10:11 AM, Brijesh Singh wrote:
> On 7/8/21 11:58 AM, Dave Hansen wrote:>> Logically its going to be
> tricky to figure out which exact entry caused
>>> the fault, hence I dump any non-zero entry. I understand it may dump
>>> some useless.
>>
>> What's tricky about it?
>>
>> Sure, there's a possibility that more than one entry could contribute to
>> a fault. But, you always know *IF* an entry could contribute to a fault.
>>
>> I'm fine if you run through the logic, don't find a known reason
>> (specific RMP entry) for the fault, and dump the whole table in that
>> case. But, unconditionally polluting the kernel log with noise isn't
>> very nice for debugging.
>
> The tricky part is to determine which undocumented bit to check to know
> that we should stop dump. I can go with your suggestion that first try
> with the known reasons and fallback to dump whole table for unknown
> reasons only.
You *can't* stop because of undocumented bits. Fundamentally. You
literally don't know if the bit means "this caused a fault" versus "this
definitely couldn't cause a fault".
Basically, if we get to the point of dumping the whole table, we should
also spit out an error message saying that the kernel is dazed and
confused and can't figure out why the hardware caused a fault. Then,
dump out the whole table so that the "hardware" folks can have a look.
Powered by blists - more mailing lists