lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1491237058.2673.3.camel@redhat.com>
Date:   Mon, 03 Apr 2017 12:30:58 -0400
From:   Jeff Layton <jlayton@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-ext4@...r.kernel.org, akpm@...ux-foundation.org,
        tytso@....edu, jack@...e.cz, neilb@...e.com
Subject: Re: [RFC PATCH 1/4] fs: new infrastructure for writeback error
 handling and reporting

On Mon, 2017-04-03 at 09:15 -0700, Matthew Wilcox wrote:
> On Mon, Apr 03, 2017 at 11:19:51AM -0400, Jeff Layton wrote:
> > Yes, so just to be clear here if you bump a 32 bit counter every
> > microsecond you'll end up wrapping in a little over an hour. How fast
> > can DAX generate I/O errors? :)
> 
> I admit to not having picked through the code, but how often do we try
> to do writebacks?  And how often do we retry writebacks once an -EIO
> has happened?  Once we mark a page as PG_error, do we keep trying to
> write it back and set the AS error each time?
> 

It depends, but I think it could theoretically happen after trying to
sync out every page in a file. With something like DAX it seems like
you could do that pretty quickly.

One thing we could do is to try and push the filemap_set_wb_error calls
out of writepage ops and allow the callers to do that so we can avoid
bumping the counter unnecessarily. Not sure if that's enough to avoid
wrapping too quickly.

> > I'm fine with a 32 bit counter (and even with using the low order bits
> > to store error flags) if we're ok with that limitation. The big
> > question there is whether it's ok to continue reporting -EIO when there
> > has actually been nothing but -ENOSPC errors since the last fsync. I
> > think it's a corner case that's not of terribly great concern so I'm
> > fine with that.
> 
> Yeah, I was thinking about that, and I'm fine with it too.
> 
> > We could try to mitigate it by zeroing out the value when i_writecount
> > goes to zero though. Then if you close all of the fds on the file, the
> > error is cleared. Or maybe we could add a new ioctl to explicitly zero
> > it out?
> 
> I'm OK with zeroing the wb_err once i_writecount drops to 0.  Everybody
> who cares has already been notified.  The new ioctl feels like overkill.

That's my feeling too.
-- 
Jeff Layton <jlayton@...hat.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ