linux-kernel - Re: [PATCH v4] x86/mce: retrieve poison range from hardware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62e2d9784bc58_37b17529451@dwillia2-xfh.jf.intel.com.notmuch>
Date:   Thu, 28 Jul 2022 11:46:16 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Jane Chu <jane.chu@...cle.com>,
        Dan Williams <dan.j.williams@...el.com>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "hch@....de" <hch@....de>,
        "nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>
Subject: Re: [PATCH v4] x86/mce: retrieve poison range from hardware

Jane Chu wrote:
> On 7/27/2022 1:01 PM, Dan Williams wrote:
> > Jane Chu wrote:
> >> On 7/27/2022 12:30 PM, Jane Chu wrote:
> >>> On 7/27/2022 12:24 PM, Jane Chu wrote:
> >>>> On 7/27/2022 11:56 AM, Dan Williams wrote:
> >>>>> Jane Chu wrote:
> >>>>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine
> >>>>>> poison granularity") that changed nfit_handle_mce() callback to report
> >>>>>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been
> >>>>>> discovered that the mce->misc LSB field is 0x1000 bytes, hence
> >>>>>> injecting
> >>>>>> 2 back-to-back poisons and the driver ends up logging 8 badblocks,
> >>>>>> because 0x1000 bytes is 8 512-byte.
> >>>>>>
> >>>>>> Dan Williams noticed that apei_mce_report_mem_error() hardcode
> >>>>>> the LSB field to PAGE_SHIFT instead of consulting the input
> >>>>>> struct cper_sec_mem_err record.  So change to rely on hardware whenever
> >>>>>> support is available.
> >>>>>>
> >>>>>> Link:
> >>>>>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com
> >>>>>>
> >>>>>>
> >>>>>> Reviewed-by: Dan Williams <dan.j.williams@...el.com>
> >>>>>> Signed-off-by: Jane Chu <jane.chu@...cle.com>
> >>>>>> ---
> >>>>>>    arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++-
> >>>>>>    1 file changed, 13 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/arch/x86/kernel/cpu/mce/apei.c
> >>>>>> b/arch/x86/kernel/cpu/mce/apei.c
> >>>>>> index 717192915f28..26d63818b2de 100644
> >>>>>> --- a/arch/x86/kernel/cpu/mce/apei.c
> >>>>>> +++ b/arch/x86/kernel/cpu/mce/apei.c
> >>>>>> @@ -29,15 +29,27 @@
> >>>>>>    void apei_mce_report_mem_error(int severity, struct
> >>>>>> cper_sec_mem_err *mem_err)
> >>>>>>    {
> >>>>>>        struct mce m;
> >>>>>> +    int grain = PAGE_SHIFT;
> >>>>>>        if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> >>>>>>            return;
> >>>>>> +    /*
> >>>>>> +     * Even if the ->validation_bits are set for address mask,
> >>>>>> +     * to be extra safe, check and reject an error radius '0',
> >>>>>> +     * and fallback to the default page size.
> >>>>>> +     */
> >>>>>> +    if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) {
> >>>>>> +        grain = ~mem_err->physical_addr_mask + 1;
> >>>>>> +        if (grain == 1)
> >>>>>> +            grain = PAGE_SHIFT;
> >>>>>
> >>>>> Wait, if @grain is the number of bits to mask off the address, shouldn't
> >>>>> this be something like:
> >>>>>
> >>>>>       grain = min_not_zero(PAGE_SHIFT,
> >>>>> hweight64(~mem_err->physical_addr_mask));
> >>>>
> >>>> I see. I guess what you meant is
> >>>>      grain = min(PAGE_SHIFT, (1 +
> >>>> hweight64(~mem_err->physical_addr_mask)));
> >>>
> >>> Sorry, take that back, it won't work either.
> >>
> >> This will work,
> >>     grain = min_not_zero(PAGE_SHIFT - 1,
> >> hweight64(~mem_err->physical_addr_mask));
> >>     grain++;
> >> but too sophisticated?  I guess I prefer the simple "if" expression.
> > 
> > An "if" is fine, I was more pointing out that:
> > 
> >      hweight64(~mem_err->physical_addr_mask) + 1
> > 
> > ...and:
> > 
> >      ~mem_err->physical_addr_mask + 1;
> > 
> > ...give different results.
> 
> They are different indeed.  hweight64 returns the count of set bit while
> ~mem_err->physical_addr_mask returns a negated value.
> 
> According to the definition of "Physical Address Mask" -
> 
> https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf
> 
> Table N-31 Memory Error Record
> 
> Physical Address Mask 24 8   Defines the valid address bits in the 
> Physical Address field. The mask specifies the granularity of the 
> physical address which is dependent on the hw/ implementation factors 
> such as interleaving.
> 
> It appears that "Physical Address Mask" is defined more like PAGE_MASK
> rather than in bitops hweight64() ofter used to count the set bits as
> an indication of (e.g.) how many registers are in use.
> 
> Ans similar to PAGE_MASK, a valid "Physical Address Mask" should
> consist of a contiguous low 0 bits, not 1's and 0's mixed up.
> 
> So far, as far as I can see, the v4 patch still looks correct to me.
> Please let me know if I'm missing anything.

The v4 patch looks broken to me. If the address mask is
0xffffffffffffffc0 to indicate a cacheline error then:

    ~mem_err->physical_addr_mask + 1;

...results in a grain of 64 when it should be 6.