linux-kernel - Re: [PATCH] Add support of NVDIMM memory error notification in ACPI 6.2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b91c4a7b-3e19-df5d-12e7-61cdf1bc2c84@hpe.com>
Date:   Thu, 8 Jun 2017 13:30:51 -0400
From:   Linda Knippers <linda.knippers@....com>
To:     "Kani, Toshimitsu" <toshi.kani@....com>,
        "dan.j.williams@...el.com" <dan.j.williams@...el.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
        "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
        "rjw@...ysocki.net" <rjw@...ysocki.net>,
        "vishal.l.verma@...el.com" <vishal.l.verma@...el.com>
Subject: Re: [PATCH] Add support of NVDIMM memory error notification in ACPI
 6.2



On 6/7/2017 5:33 PM, Kani, Toshimitsu wrote:
> On Wed, 2017-06-07 at 14:06 -0700, Dan Williams wrote:
>> On Wed, Jun 7, 2017 at 1:57 PM, Kani, Toshimitsu <toshi.kani@....com>
>> wrote:
>>> On Wed, 2017-06-07 at 12:09 -0700, Dan Williams wrote:
>>>> On Wed, Jun 7, 2017 at 11:49 AM, Toshi Kani <toshi.kani@....com>
>>>> wrote:
>>>
>>>  :
>>>>> +
>>>>> +static void acpi_nfit_uc_error_notify(struct device *dev,
>>>>> acpi_handle handle)
>>>>> +{
>>>>> +       struct acpi_nfit_desc *acpi_desc =
>>>>> dev_get_drvdata(dev);
>>>>> +
>>>>> +       acpi_nfit_ars_rescan(acpi_desc);
>>>>
>>>> I wonder if we should gate re-scanning with a similar:
>>>>
>>>>     if (acpi_desc->scrub_mode == HW_ERROR_SCRUB_ON)
>>>>
>>>> ...check that we do in the mce notification case? Maybe not since
>>>> we
>>>> don't get an indication of where the error is without a rescan.
>>>
>>> I think this mce case is different since the MCE handler already
>>> knows where the new poison location is and can update badblocks
>>> information for it.  Starting ARS is an optional precaution.
>>>
>>>> However, at a minimum I think we need support for the new Start
>>>> ARS flag ("If set to 1 the firmware shall return data from a
>>>> previous scrub, if any, without starting a new scrub") and use
>>>> that for this case.
>>>
>>> That's an interesting idea.  But I wonder how users know if it is
>>> OK to set this flag as it relies on BIOS implementation that is not
>>> described in ACPI...
>>
>> Ugh, you're right. We might need a revision-id=2 version of Start ARS
>> so software knows that the BIOS is aware of the new flag.
>
> My bad.  Looking at ACPI 6.2, it actually defines what you described.
> Start ARS now defines bit[1] in Flags which can be set to avoid
> scanning for this notification.  I will update the patch to set this
> flag when HW_ERROR_SCRUB_ON is not set.

Wasn't Dan concerned about how the OS can know whether the FW supports
that bit in the Start ARS?

The Query ARS Capabilities DSM has a bit that tells the OS whether the
platform supports the notification and the point of the notification was
to tell the OS it could do a Start ARS with bit 1 set.  Of course, if
you get the notification then that means the platform has the capability
to deliver it, but it might not hurt to check the flag from the Query
Capabilities bit.

If the spec isn't clear about the relationship between the bits, we could
clarify that in the spec without bumping the revision number for the call.

We could also check to see what a platform does if it gets a flag it
doesn't know about.  Would that be an Invalid Input Parameter?

-- ljk
>
> Thanks,
> -Toshi.��칻.�&�~�&�.��+-��ݶ.��w��˛���m�b��Zr����^n�r���z�.��h����&��.�G���h�.(�階�ݢj"��.�.m�����z�ޖ���f���h���~�m�
>