lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YYK+hfmB05URYO75@infradead.org>
Date:   Wed, 3 Nov 2021 09:53:25 -0700
From:   Christoph Hellwig <hch@...radead.org>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     Christoph Hellwig <hch@...radead.org>,
        Jane Chu <jane.chu@...cle.com>,
        "david@...morbit.com" <david@...morbit.com>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "vishal.l.verma@...el.com" <vishal.l.verma@...el.com>,
        "dave.jiang@...el.com" <dave.jiang@...el.com>,
        "agk@...hat.com" <agk@...hat.com>,
        "snitzer@...hat.com" <snitzer@...hat.com>,
        "dm-devel@...hat.com" <dm-devel@...hat.com>,
        "ira.weiny@...el.com" <ira.weiny@...el.com>,
        "willy@...radead.org" <willy@...radead.org>,
        "vgoyal@...hat.com" <vgoyal@...hat.com>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>
Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with
 RWF_RECOVERY_DATA flag

On Tue, Nov 02, 2021 at 09:03:55AM -0700, Dan Williams wrote:
> > why devices are built to handle them.  It is just the Intel-style
> > pmem interface to handle them which is completely broken.
> 
> No, any media can report checksum / parity errors. NVME also seems to
> do a poor job with multi-bit ECC errors consumed from DRAM. There is
> nothing "pmem" or "Intel" specific here.

If you do get data corruption from NVMe (which yes can happen despite
the typical very good UBER rate) you just write over it again.  You
don't need to magically whack the underlying device.  Same for hard
drives.

> > Well, my point is doing recovery from bit errors is by definition not
> > the fast path.  Which is why I'd rather keep it away from the pmem
> > read/write fast path, which also happens to be the (much more important)
> > non-pmem read/write path.
> 
> I would expect this interface to be useful outside of pmem as a
> "failfast" or "try harder to recover" flag for reading over media
> errors.

Maybe we need to sit down and define useful semantics then?  The problem
on the write side isn't really that the behavior with the flag is
undefined, it is more that writes without the flag have horrible
semantics if they don't just clear the error.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ