linux-kernel - RE: [PATCH] acpi/nfit: badrange report spill over to clean range

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <62ce16518e7d3_6070c29447@dwillia2-xfh.jf.intel.com.notmuch>
Date:   Tue, 12 Jul 2022 17:48:17 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Jane Chu <jane.chu@...cle.com>, <dan.j.williams@...el.com>,
        <hch@...radead.org>, <vishal.l.verma@...el.com>,
        <dave.jiang@...el.com>, <ira.weiny@...el.com>,
        <nvdimm@...ts.linux.dev>, <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] acpi/nfit: badrange report spill over to clean range

Jane Chu wrote:
> Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison
> granularity") changed nfit_handle_mce() callback to report badrange for
> each poison at an alignment indicated by 1ULL << MCI_MISC_ADDR_LSB(mce->misc)
> instead of the hardcoded L1_CACHE_BYTES. However recently on a server
> populated with Intel DCPMEM v2 dimms, it appears that
> 1UL << MCI_MISC_ADDR_LSB(mce->misc) turns out is 4KiB, or 8 512-byte blocks.
> Consequently, injecting 2 back-to-back poisons via ndctl, and it reports
> 8 poisons.
> 
> [29076.590281] {3}[Hardware Error]:   physical_address: 0x00000040a0602400
> [..]
> [29076.619447] Memory failure: 0x40a0602: recovery action for dax page: Recovered
> [29076.627519] mce: [Hardware Error]: Machine check events logged
> [29076.634033] nfit ACPI0012:00: addr in SPA 1 (0x4080000000, 0x1f80000000)
> [29076.648805] nd_bus ndbus0: XXX nvdimm_bus_add_badrange: (0x40a0602000, 0x1000)
> [..]
> [29078.634817] {4}[Hardware Error]:   physical_address: 0x00000040a0602600
> [..]
> [29079.595327] nfit ACPI0012:00: addr in SPA 1 (0x4080000000, 0x1f80000000)
> [29079.610106] nd_bus ndbus0: XXX nvdimm_bus_add_badrange: (0x40a0602000, 0x1000)
> [..]
> {
>   "dev":"namespace0.0",
>   "mode":"fsdax",
>   "map":"dev",
>   "size":33820770304,
>   "uuid":"a1b0f07f-747f-40a8-bcd4-de1560a1ef75",
>   "sector_size":512,
>   "align":2097152,
>   "blockdev":"pmem0",
>   "badblock_count":8,
>   "badblocks":[
>     {
>       "offset":8208,
>       "length":8,
>       "dimms":[
>         "nmem0"
>       ]
>     }
>   ]
> }
> 
> So, 1UL << MCI_MISC_ADDR_LSB(mce->misc) is an unreliable indicator for poison
> radius and shouldn't be used.  More over, as each injected poison is being
> reported independently, any alignment under 512-byte appear works:
> L1_CACHE_BYTES (though inaccurate), or 256-bytes (as ars->length reports),
> or 512-byte.
> 
> To get around this issue, 512-bytes is chosen as the alignment because
>   a. it happens to be the badblock granularity,
>   b. ndctl inject-error cannot inject more than one poison to a 512-byte block,
>   c. architecture agnostic

I am failing to see the kernel bug? Yes, you injected less than 8
"badblocks" of poison and the hardware reported 8 blocks of poison, but
that's not the kernel's fault, that's the hardware. What happens when
hardware really does detect 8 blocks of consective poison and this
implementation decides to only record 1 at a time?

It seems the fix you want is for the hardware to report the precise
error bounds and that 1UL << MCI_MISC_ADDR_LSB(mce->misc) does not have
that precision in this case.

However, the ARS engine likely can return the precise error ranges so I
think the fix is to just use the address range indicated by 1UL <<
MCI_MISC_ADDR_LSB(mce->misc) to filter the results from a short ARS
scrub request to ask the device for the precise error list.