linux-kernel - Re: [PATCH v4] x86/mce: retrieve poison range from hardware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2260c16e-7bb9-7315-5d5b-8656ca1ba93e@oracle.com>
Date:   Thu, 28 Jul 2022 20:29:38 +0000
From:   Jane Chu <jane.chu@...cle.com>
To:     Dan Williams <dan.j.williams@...el.com>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "hch@....de" <hch@....de>,
        "nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>
Subject: Re: [PATCH v4] x86/mce: retrieve poison range from hardware

On 7/28/2022 11:46 AM, Dan Williams wrote:
> Jane Chu wrote:
>> On 7/27/2022 1:01 PM, Dan Williams wrote:
>>> Jane Chu wrote:
>>>> On 7/27/2022 12:30 PM, Jane Chu wrote:
>>>>> On 7/27/2022 12:24 PM, Jane Chu wrote:
>>>>>> On 7/27/2022 11:56 AM, Dan Williams wrote:
>>>>>>> Jane Chu wrote:
>>>>>>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine
>>>>>>>> poison granularity") that changed nfit_handle_mce() callback to report
>>>>>>>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been
>>>>>>>> discovered that the mce->misc LSB field is 0x1000 bytes, hence
>>>>>>>> injecting
>>>>>>>> 2 back-to-back poisons and the driver ends up logging 8 badblocks,
>>>>>>>> because 0x1000 bytes is 8 512-byte.
>>>>>>>>
>>>>>>>> Dan Williams noticed that apei_mce_report_mem_error() hardcode
>>>>>>>> the LSB field to PAGE_SHIFT instead of consulting the input
>>>>>>>> struct cper_sec_mem_err record.  So change to rely on hardware whenever
>>>>>>>> support is available.
>>>>>>>>
>>>>>>>> Link:
>>>>>>>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com
>>>>>>>>
>>>>>>>>
>>>>>>>> Reviewed-by: Dan Williams <dan.j.williams@...el.com>
>>>>>>>> Signed-off-by: Jane Chu <jane.chu@...cle.com>
>>>>>>>> ---
>>>>>>>>     arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++-
>>>>>>>>     1 file changed, 13 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/x86/kernel/cpu/mce/apei.c
>>>>>>>> b/arch/x86/kernel/cpu/mce/apei.c
>>>>>>>> index 717192915f28..26d63818b2de 100644
>>>>>>>> --- a/arch/x86/kernel/cpu/mce/apei.c
>>>>>>>> +++ b/arch/x86/kernel/cpu/mce/apei.c
>>>>>>>> @@ -29,15 +29,27 @@
>>>>>>>>     void apei_mce_report_mem_error(int severity, struct
>>>>>>>> cper_sec_mem_err *mem_err)
>>>>>>>>     {
>>>>>>>>         struct mce m;
>>>>>>>> +    int grain = PAGE_SHIFT;
>>>>>>>>         if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
>>>>>>>>             return;
>>>>>>>> +    /*
>>>>>>>> +     * Even if the ->validation_bits are set for address mask,
>>>>>>>> +     * to be extra safe, check and reject an error radius '0',
>>>>>>>> +     * and fallback to the default page size.
>>>>>>>> +     */
>>>>>>>> +    if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) {
>>>>>>>> +        grain = ~mem_err->physical_addr_mask + 1;
>>>>>>>> +        if (grain == 1)
>>>>>>>> +            grain = PAGE_SHIFT;
>>>>>>>
>>>>>>> Wait, if @grain is the number of bits to mask off the address, shouldn't
>>>>>>> this be something like:
>>>>>>>
>>>>>>>        grain = min_not_zero(PAGE_SHIFT,
>>>>>>> hweight64(~mem_err->physical_addr_mask));
>>>>>>
>>>>>> I see. I guess what you meant is
>>>>>>       grain = min(PAGE_SHIFT, (1 +
>>>>>> hweight64(~mem_err->physical_addr_mask)));
>>>>>
>>>>> Sorry, take that back, it won't work either.
>>>>
>>>> This will work,
>>>>      grain = min_not_zero(PAGE_SHIFT - 1,
>>>> hweight64(~mem_err->physical_addr_mask));
>>>>      grain++;
>>>> but too sophisticated?  I guess I prefer the simple "if" expression.
>>>
>>> An "if" is fine, I was more pointing out that:
>>>
>>>       hweight64(~mem_err->physical_addr_mask) + 1
>>>
>>> ...and:
>>>
>>>       ~mem_err->physical_addr_mask + 1;
>>>
>>> ...give different results.
>>
>> They are different indeed.  hweight64 returns the count of set bit while
>> ~mem_err->physical_addr_mask returns a negated value.
>>
>> According to the definition of "Physical Address Mask" -
>>
>> https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf
>>
>> Table N-31 Memory Error Record
>>
>> Physical Address Mask 24 8   Defines the valid address bits in the
>> Physical Address field. The mask specifies the granularity of the
>> physical address which is dependent on the hw/ implementation factors
>> such as interleaving.
>>
>> It appears that "Physical Address Mask" is defined more like PAGE_MASK
>> rather than in bitops hweight64() ofter used to count the set bits as
>> an indication of (e.g.) how many registers are in use.
>>
>> Ans similar to PAGE_MASK, a valid "Physical Address Mask" should
>> consist of a contiguous low 0 bits, not 1's and 0's mixed up.
>>
>> So far, as far as I can see, the v4 patch still looks correct to me.
>> Please let me know if I'm missing anything.
> 
> The v4 patch looks broken to me. If the address mask is
> 0xffffffffffffffc0 to indicate a cacheline error then:
> 
>      ~mem_err->physical_addr_mask + 1;
> 
> ...results in a grain of 64 when it should be 6.

Right, it's the exponent that's needed, so back to __ffs64().
Sorry for the detour.  v5 is coming next.

thanks!
-jane