linux-kernel - Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1491830396.2732.3.camel@redhat.com>
Date:   Mon, 10 Apr 2017 09:19:56 -0400
From:   Jeff Layton <jlayton@...hat.com>
To:     NeilBrown <neilb@...e.com>, Matthew Wilcox <willy@...radead.org>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-ext4@...r.kernel.org, akpm@...ux-foundation.org,
        tytso@....edu, jack@...e.cz
Subject: Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking
 infrastructure and convert ext4 to use it

On Mon, 2017-04-10 at 09:15 +1000, NeilBrown wrote:
> On Fri, Apr 07 2017, Jeff Layton wrote:
> 
> > 
> > > ... and then there's no need to update if it's the same errno and nobody's
> > > seen it:
> > > 
> > > 		if (old == new)
> > > 			break;
> > > 
> > 
> > No, we can't do this. The thing could have just been updated by a task
> > that is setting the "seen" bit. We don't want to lose the error here. We
> > always have to do the cmpxchg on the set_wb_error side, I think.
> 
> I don't follow your logic.
> If (old == new) then there was a moment since this function started when
> performing the cmpxchg() would not have changed the contents of memory.
> So let's pretend it did actually happen at that moment, and not change
> memory.
> 
> If we race with a task setting the "seen" bit, then it will have seen
> the error *after* the new error, that this thread is reporting, actually
> happened.  So the result is still correct.
> 

Ok, that does make sense. I'll plan to do that.

There's also a bug in the last patch that I sent. We need to mark the
SEEN bit when we sample the value at open time, so we need a
filemap_sample_wb_error function to grab the current wb_err_t and mark
it SEEN if necessary.

That also gives us a way to handle something like filemap_write_and_wait
(which doesn't take a struct file). We can sample the wb_err_t prior to
starting writeback, and then return an error if anything failed after
that point.

I think that's probably close enough to how the current code works that
we can use it to make drop-in replacements for filemap_write_and_wait*
which should keep us from having to change so much existing code here.

filemap_check_errors would need to take a previously-sampled wb_err_t
argument, but only the lowest-level callers of that and
filemap_fdatawait* would need to deal with them directly.
-- 
Jeff Layton <jlayton@...hat.com>