[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4hK18DetEf9+NcDqM5y07Vp-=nhysHJ3JSnKbS-ET2ppw@mail.gmail.com>
Date: Tue, 2 Nov 2021 09:03:55 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Christoph Hellwig <hch@...radead.org>
Cc: Jane Chu <jane.chu@...cle.com>,
"david@...morbit.com" <david@...morbit.com>,
"djwong@...nel.org" <djwong@...nel.org>,
"vishal.l.verma@...el.com" <vishal.l.verma@...el.com>,
"dave.jiang@...el.com" <dave.jiang@...el.com>,
"agk@...hat.com" <agk@...hat.com>,
"snitzer@...hat.com" <snitzer@...hat.com>,
"dm-devel@...hat.com" <dm-devel@...hat.com>,
"ira.weiny@...el.com" <ira.weiny@...el.com>,
"willy@...radead.org" <willy@...radead.org>,
"vgoyal@...hat.com" <vgoyal@...hat.com>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>
Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag
On Tue, Oct 26, 2021 at 11:50 PM Christoph Hellwig <hch@...radead.org> wrote:
>
> On Fri, Oct 22, 2021 at 08:52:55PM +0000, Jane Chu wrote:
> > Thanks - I try to be honest. As far as I can tell, the argument
> > about the flag is a philosophical argument between two views.
> > One view assumes design based on perfect hardware, and media error
> > belongs to the category of brokenness. Another view sees media
> > error as a build-in hardware component and make design to include
> > dealing with such errors.
>
> No, I don't think so. Bit errors do happen in all media, which is
> why devices are built to handle them. It is just the Intel-style
> pmem interface to handle them which is completely broken.
No, any media can report checksum / parity errors. NVME also seems to
do a poor job with multi-bit ECC errors consumed from DRAM. There is
nothing "pmem" or "Intel" specific here.
> > errors in mind from start. I guess I'm trying to articulate why
> > it is acceptable to include the RWF_DATA_RECOVERY flag to the
> > existing RWF_ flags. - this way, pwritev2 remain fast on fast path,
> > and its slow path (w/ error clearing) is faster than other alternative.
> > Other alternative being 1 system call to clear the poison, and
> > another system call to run the fast pwrite for recovery, what
> > happens if something happened in between?
>
> Well, my point is doing recovery from bit errors is by definition not
> the fast path. Which is why I'd rather keep it away from the pmem
> read/write fast path, which also happens to be the (much more important)
> non-pmem read/write path.
I would expect this interface to be useful outside of pmem as a
"failfast" or "try harder to recover" flag for reading over media
errors.
Powered by blists - more mailing lists