linux-ext4 - Re: [PATCH v2 5/5] dax: handle media errors in dax_do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPcyv4h19Cp93f+vQXapnmXLEXHE2RZGyQVo7dCnAqcmnW1GEg@mail.gmail.com>
Date:	Tue, 26 Apr 2016 07:59:10 -0700
From:	Dan Williams <dan.j.williams@...el.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	"Verma, Vishal L" <vishal.l.verma@...el.com>,
	"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
	"jack@...e.cz" <jack@...e.cz>, "axboe@...com" <axboe@...com>,
	"linux-nvdimm@...1.01.org" <linux-nvdimm@...1.01.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"xfs@....sgi.com" <xfs@....sgi.com>,
	"hch@...radead.org" <hch@...radead.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"Wilcox, Matthew R" <matthew.r.wilcox@...el.com>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>
Subject: Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

On Tue, Apr 26, 2016 at 1:27 AM, Dave Chinner <david@...morbit.com> wrote:
> On Mon, Apr 25, 2016 at 09:18:42PM -0700, Dan Williams wrote:
[..]
> It seems to me you are focussing on code/technologies that exist
> today instead of trying to define an architecture that is more
> optimal for pmem storage systems. Yes, working code is great, but if
> you can't tell people how things like robust error handling and
> redundancy are going to work in future then it's going to take
> forever for everyone else to handle such errors robustly through the
> storage stack...

Precisely because higher order redundancy is built on top this baseline.

MD-RAID can't do it's error recovery if we don't have -EIO and
clear-error-on-write.  On the other hand, you're absolutely right that
we have a gaping hole on top of the SIGBUS recovery model, and don't
have a kernel layer we can interpose on top of DAX to provide some
semblance of redundancy.

In the meantime, a handful of applications with a team of full-time
site-reliability-engineers may be able to plug in external redundancy
infrastructure on top of what is defined in these patches.  For
everyone else, the hard problem, we need to do a lot more thinking
about a trap and recover solution.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html