lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPcyv4ieKRPP43-FQQS5OfXigSZYoa5mEqiRN9ujj=fe37+e4g@mail.gmail.com>
Date:   Tue, 19 Sep 2017 16:45:38 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     "linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Vishal L Verma <vishal.l.verma@...el.com>
Subject: Re: DAX error inject/page poison

On Tue, Sep 19, 2017 at 4:15 PM, Mike Kravetz <mike.kravetz@...cle.com> wrote:
>
> We were trying to simulate pmem errors in an environment where a DAX
> filesystem is used (ext4 although I suspect it does not matter).  The
> sequence attempted on a DAX filesystem is:
> - Populate a file in the DAX filesystem
> - mmap the file
> - madvise(MADV_HWPOISON)
>
> The madvise operation fails with EFAULT.  This appears to come from
> get_user_pages() as there are no struct pages for such mappings?
>
> The idea is to make sure an application can recover from such errors
> by hole punching and repopulating with another page.
>
> A couple questions:
> It seems like madvise(MADV_HWPOISON) is not going to work (ever?) in
> such situations.  If so, should we perhaps add a IS_DAX like check and
> return something like EINVAL?  Or, at least document expected behavior?

The MADV_HWPOISON machinery assumes normal memory pages, not DAX and
certainly not the special ZONE_DEVICE pages we allocate for the
purpose of DMA. Returning EINVAL seems like the right thing to do
since there is no facility in the kernel to soft offline a DAX page.
In other words MADV_HWPOISON is for emulating errors in volatile
memory that might be transient until the next reboot, DAX errors cause
permanent data loss in filesytem files, so the error injection and
handling models need to be different.

> If madvise(MADV_HWPOISON) will not work, how can one inject errors to
> test error handling code?

Similar to "hdparm --make-bad-sector" we need a platform specific
facility to inject a hard memory error at a given physical persistent
memory address. In the case of an ACPI 6.2 based platform that
mechanism is: "Section 9.20.7.9 Function Index 7 - ARS Error Inject".

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ