[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b3f0bfcd-9e7c-cc71-e91d-d0f28053dea9@oracle.com>
Date: Fri, 15 Apr 2022 16:18:57 +0000
From: Jane Chu <jane.chu@...cle.com>
To: Dan Williams <dan.j.williams@...el.com>
CC: david <david@...morbit.com>, "Darrick J. Wong" <djwong@...nel.org>,
Christoph Hellwig <hch@...radead.org>,
Vishal L Verma <vishal.l.verma@...el.com>,
Dave Jiang <dave.jiang@...el.com>,
Alasdair Kergon <agk@...hat.com>,
Mike Snitzer <snitzer@...hat.com>,
device-mapper development <dm-devel@...hat.com>,
"Weiny, Ira" <ira.weiny@...el.com>,
Matthew Wilcox <willy@...radead.org>,
Vivek Goyal <vgoyal@...hat.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux NVDIMM <nvdimm@...ts.linux.dev>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-xfs <linux-xfs@...r.kernel.org>, X86 ML <x86@...nel.org>,
"luto@...nel.org" <luto@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"dave.hansen@...el.com" <dave.hansen@...el.com>
Subject: Re: [PATCH v7 3/6] mce: fix set_mce_nospec to always unmap the whole
page
On 4/13/2022 7:32 PM, Dan Williams wrote:
> On Wed, Apr 13, 2022 at 4:36 PM Jane Chu <jane.chu@...cle.com> wrote:
>>
>> On 4/11/2022 4:27 PM, Dan Williams wrote:
>>> On Tue, Apr 5, 2022 at 12:48 PM Jane Chu <jane.chu@...cle.com> wrote:
>>>>
>>>> The set_memory_uc() approach doesn't work well in all cases.
>>>> For example, when "The VMM unmapped the bad page from guest
>>>> physical space and passed the machine check to the guest."
>>>> "The guest gets virtual #MC on an access to that page.
>>>> When the guest tries to do set_memory_uc() and instructs
>>>> cpa_flush() to do clean caches that results in taking another
>>>> fault / exception perhaps because the VMM unmapped the page
>>>> from the guest."
>>>>
>>>> Since the driver has special knowledge to handle NP or UC,
>>>
>>> I think a patch is needed before this one to make this statement true? I.e.:
>>>
>>> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
>>> index ee8d9973f60b..11641f55025a 100644
>>> --- a/drivers/acpi/nfit/mce.c
>>> +++ b/drivers/acpi/nfit/mce.c
>>> @@ -32,6 +32,7 @@ static int nfit_handle_mce(struct notifier_block
>>> *nb, unsigned long val,
>>> */
>>> mutex_lock(&acpi_desc_lock);
>>> list_for_each_entry(acpi_desc, &acpi_descs, list) {
>>> + unsigned int align = 1UL << MCI_MISC_ADDR_LSB(mce->misc);
>>> struct device *dev = acpi_desc->dev;
>>> int found_match = 0;
>>>
>>> @@ -63,8 +64,7 @@ static int nfit_handle_mce(struct notifier_block
>>> *nb, unsigned long val,
>>>
>>> /* If this fails due to an -ENOMEM, there is little we can do */
>>> nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
>>> - ALIGN(mce->addr, L1_CACHE_BYTES),
>>> - L1_CACHE_BYTES);
>>> + ALIGN(mce->addr, align), align);
>>> nvdimm_region_notify(nfit_spa->nd_region,
>>> NVDIMM_REVALIDATE_POISON);
>>>
>>
>> Dan, I tried the above change, and this is what I got after injecting 8
>> back-to-back poisons, then read them and received SIGBUS/BUS_MCEERR_AR,
>> then repair via the v7 patch which works until this change is added.
>>
>> [ 6240.955331] nfit ACPI0012:00: XXX, align = 100
>> [ 6240.960300] nfit ACPI0012:00: XXX, ALIGN(mce->addr,
>> L1_CACHE_BYTES)=1851600400, L1_CACHE_BYTES=40, ALIGN(mce->addr,
>> align)=1851600400
>> [..]
>> [ 6242.052277] nfit ACPI0012:00: XXX, align = 100
>> [ 6242.057243] nfit ACPI0012:00: XXX, ALIGN(mce->addr,
>> L1_CACHE_BYTES)=1851601000, L1_CACHE_BYTES=40, ALIGN(mce->addr,
>> align)=1851601000
>> [..]
>> [ 6244.917198] nfit ACPI0012:00: XXX, align = 1000
>> [ 6244.922258] nfit ACPI0012:00: XXX, ALIGN(mce->addr,
>> L1_CACHE_BYTES)=1851601200, L1_CACHE_BYTES=40, ALIGN(mce->addr,
>> align)=1851602000
>> [..]
>>
>> All 8 poisons remain uncleared.
>>
>> Without further investigation, I don't know why the failure.
>> Could we mark this change to a follow-on task?
>
> Perhaps a bit more debug before kicking this can down the road...
>
> I'm worried that this means that the driver is not accurately tracking
> poison data For example, that last case the hardware is indicating a
> full page clobber, but the old code would only track poison from
> 1851601200 to 1851601400 (i.e. the first 512 bytes from the base error
> address).
I would appear so, but the old code tracking seems to be working
correctly. For example, injecting 8 back-to-back poison, the
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/badblocks
cpatures all 8 of them, that spans 2K range, right? I never had issue
about missing poison in my tests.
>
> Oh... wait, I think there is a second bug here, that ALIGN should be
> ALIGN_DOWN(). Does this restore the result you expect?
That's it, ALIGN_DOWN fixed the issue, thanks!!
I'll add a patch, suggested-by you.
>
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index ee8d9973f60b..d7a52238a741 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -63,8 +63,7 @@ static int nfit_handle_mce(struct notifier_block
> *nb, unsigned long val,
>
> /* If this fails due to an -ENOMEM, there is little we can do */
> nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
> - ALIGN(mce->addr, L1_CACHE_BYTES),
> - L1_CACHE_BYTES);
> + ALIGN_DOWN(mce->addr, align), align);
> nvdimm_region_notify(nfit_spa->nd_region,
> NVDIMM_REVALIDATE_POISON);
>
>
>> The driver knows a lot about how to clear poisons besides hardcoding
>> poison alignment to 0x40 bytes.
>
> It does, but the badblocks tracking should still be reliable, and if
> it's not reliable I expect there are cases where recovery_write() will
> not be triggered because the driver will not fail the
> dax_direct_access() attempt.
thanks!
-jane
Powered by blists - more mailing lists