lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4hK18DetEf9+NcDqM5y07Vp-=nhysHJ3JSnKbS-ET2ppw@mail.gmail.com>
Date:   Tue, 2 Nov 2021 09:03:55 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     Jane Chu <jane.chu@...cle.com>,
        "david@...morbit.com" <david@...morbit.com>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "vishal.l.verma@...el.com" <vishal.l.verma@...el.com>,
        "dave.jiang@...el.com" <dave.jiang@...el.com>,
        "agk@...hat.com" <agk@...hat.com>,
        "snitzer@...hat.com" <snitzer@...hat.com>,
        "dm-devel@...hat.com" <dm-devel@...hat.com>,
        "ira.weiny@...el.com" <ira.weiny@...el.com>,
        "willy@...radead.org" <willy@...radead.org>,
        "vgoyal@...hat.com" <vgoyal@...hat.com>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>
Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag

On Tue, Oct 26, 2021 at 11:50 PM Christoph Hellwig <hch@...radead.org> wrote:
>
> On Fri, Oct 22, 2021 at 08:52:55PM +0000, Jane Chu wrote:
> > Thanks - I try to be honest.  As far as I can tell, the argument
> > about the flag is a philosophical argument between two views.
> > One view assumes design based on perfect hardware, and media error
> > belongs to the category of brokenness. Another view sees media
> > error as a build-in hardware component and make design to include
> > dealing with such errors.
>
> No, I don't think so.  Bit errors do happen in all media, which is
> why devices are built to handle them.  It is just the Intel-style
> pmem interface to handle them which is completely broken.

No, any media can report checksum / parity errors. NVME also seems to
do a poor job with multi-bit ECC errors consumed from DRAM. There is
nothing "pmem" or "Intel" specific here.

> > errors in mind from start.  I guess I'm trying to articulate why
> > it is acceptable to include the RWF_DATA_RECOVERY flag to the
> > existing RWF_ flags. - this way, pwritev2 remain fast on fast path,
> > and its slow path (w/ error clearing) is faster than other alternative.
> > Other alternative being 1 system call to clear the poison, and
> > another system call to run the fast pwrite for recovery, what
> > happens if something happened in between?
>
> Well, my point is doing recovery from bit errors is by definition not
> the fast path.  Which is why I'd rather keep it away from the pmem
> read/write fast path, which also happens to be the (much more important)
> non-pmem read/write path.

I would expect this interface to be useful outside of pmem as a
"failfast" or "try harder to recover" flag for reading over media
errors.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ