lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0a4feaa-e3b5-5b92-203c-ca9339282820@intel.com>
Date:   Wed, 3 May 2017 14:56:21 -0700
From:   Dave Jiang <dave.jiang@...el.com>
To:     Dan Williams <dan.j.williams@...el.com>,
        "Kani, Toshimitsu" <toshi.kani@....com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>
Subject: Re: [RFC PATCH] dax: add badblocks check to Device DAX

On 05/03/2017 02:48 PM, Dan Williams wrote:
> On Wed, May 3, 2017 at 11:46 AM, Kani, Toshimitsu <toshi.kani@....com> wrote:
>> On Wed, 2017-05-03 at 09:30 -0700, Dan Williams wrote:
>>> On Wed, May 3, 2017 at 9:09 AM, Kani, Toshimitsu <toshi.kani@....com>
>>> wrote:
>>>> On Wed, 2017-05-03 at 08:52 -0700, Dan Williams wrote:
>>>>> On Wed, May 3, 2017 at 8:31 AM, Toshi Kani <toshi.kani@....com>
>>>>> wrote:
>>>>>> This is a RFC patch for seeking suggestions.  It adds support
>>>>>> of badblocks check in Device DAX by using region-level
>>>>>> badblocks list.  This patch is only briefly tested.
>>>>>>
>>>>>> device_dax is a well-isolated self-contained module as it calls
>>>>>> alloc_dax() with dev_dax, which is private to device_dax.  For
>>>>>> checking badblocks, it needs to call dax_pmem to check with
>>>>>> region-level badblocks.
>>>>>>
>>>>>> This patch attempts to keep device_dax self-contained.  It adds
>>>>>> check_error() to dax_operations, and dax_check_error() as a
>>>>>> stub with *dev_dax and *dev pointers to convey it to
>>>>>> dax_pmem.  I am wondering if this is the right direction, or we
>>>>>> should change the modularity to let dax_pmem call alloc_dax()
>>>>>> with its dax_pmem (or I completely missed something).
>>>>>
>>>>> The problem is that device-dax guarantees a given fault
>>>>> granularity. To make that guarantee we can't fallback from 1G or
>>>>> 2M mappings due to an error. We also can't reasonably go the
>>>>> other way and fail mappings that contain a badblock because that
>>>>> would change the blast radius of a media error to the fault size.
>>>>
>>>> Does it mean we expect users to have CPUs with MCE recovery for
>>>> Device DAX?  Can we add an attributes like allow error-check &
>>>> fall-back?
>>>
>>> Yes, without MCE recovery device-dax mappings that consume errors
>>> will reboot. If an application needs the kernel protection it should
>>> be using filesystem-dax.
>>
>> Understood.  Are we going to provide sysfs "badblocks" for Device DAX
>> as it is also needed for ndctl clear-error?
> 
> No, I had started that way, but badblocks really needs write(2) or
> fallocate(PUNCH_HOLE) support for clearing errors. Since we don't want
> to support write(2) and were NAKd from supporting fallocate() the only
> interface that was left was sending clear-error-DSM ioctls directly to
>  the nvdimm bus. Since that is a very libnvdimm specific interface it
> made sense to then add badblocks at the libnvdimm-region level. The
> "ndctl clear-error" command is there to do the translation of error
> offsets in user space and supersedes the need for the kernel to carry
> a badblocks file for device-dax.
> 

Toshi, I'm also working on ndctl list-errors in relations to dev dax so
that you get a list of badblocks that are fixed up for dev dax.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ