linux-ext4 - Re: [PATCH 5/5] dax: handle media errors in dax_do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1458939566.5501.5.camel@intel.com>
Date:	Fri, 25 Mar 2016 20:59:30 +0000
From:	"Verma, Vishal L" <vishal.l.verma@...el.com>
To:	"hch@...radead.org" <hch@...radead.org>
CC:	"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
	"xfs@....sgi.com" <xfs@....sgi.com>,
	"linux-nvdimm@...1.01.org" <linux-nvdimm@...1.01.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
	"Williams, Dan J" <dan.j.williams@...el.com>,
	"axboe@...com" <axboe@...com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"ross.zwisler@...ux.intel.com" <ross.zwisler@...ux.intel.com>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	"Wilcox, Matthew R" <matthew.r.wilcox@...el.com>,
	"david@...morbit.com" <david@...morbit.com>,
	"jack@...e.cz" <jack@...e.cz>
Subject: Re: [PATCH 5/5] dax: handle media errors in dax_do_io

On Fri, 2016-03-25 at 03:45 -0700, Christoph Hellwig wrote:
> On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
> > 
> > dax_do_io (called for read() or write() for a dax file system) may
> > fail
> > in the presence of bad blocks or media errors. Since we expect that
> > a
> > write should clear media errors on nvdimms, make dax_do_io fall
> > back to
> > the direct_IO path, which will send down a bio to the driver, which
> > can
> > then attempt to clear the error.
> Leave the fallback on -EIO to the callers please.  They generally
> call
> __blockdev_direct_IO anyway, so it should actually become simpler
> that
> way.

I thought of this, but made the retrying happen in the wrapper so that
it can be centralized. If the callers were to become responsible for
the retry, then any new callers of dax_do_io might not realize they are
responsible for retrying, and hit problems. Another tricky point might
be: in the wrapper, if __dax_do_io failed with -EIO, and subsequently
__blockdev_direct_IO also fails with a *different* error, I chose to
return -EIO because that was the 'first' error we hit and caused us to
fallback.. (Does this even seem reasonable?) And if so, do we want to
push back this decision too, to the callers?