linux-kernel - Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87efx6tnbr.fsf@notabene.neil.brown.name>
Date:   Thu, 06 Apr 2017 10:02:48 +1000
From:   NeilBrown <neilb@...e.com>
To:     Jeff Layton <jlayton@...hat.com>,
        Matthew Wilcox <willy@...radead.org>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-ext4@...r.kernel.org, akpm@...ux-foundation.org,
        tytso@....edu, jack@...e.cz
Subject: Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

On Thu, Apr 06 2017, Jeff Layton wrote:

> On Tue, 2017-04-04 at 10:09 -0700, Matthew Wilcox wrote:
>> On Tue, Apr 04, 2017 at 12:25:46PM -0400, Jeff Layton wrote:
>> > That said, I think giving more specific errors where we can is useful.
>> > When your program is erroring out and writing 'I/O error' to the logs,
>> > then how much time will your admins burn before they figure out that it
>> > really failed because the filesystem was full?
>> 
>> df is one of the first things I check ... a few years ago, I also learned
>> to check df -i ... ;-)
>> 
>> Anyway, given the decision to simply report the last error lets us do this
>> implementation:
>> 
>> void filemap_set_wb_error(struct address_space *mapping, int err)
>> {
>> 	struct inode *inode = mapping->host;
>> 	unsigned int wb_err;
>> 
>> 	if (!err)
>> 		return;
>> 	/*
>> 	 * This should be called with the error code that we want to return
>> 	 * on fsync. Thus, it should always be <= 0.
>> 	 */
>> 	WARN_ON(err > 0 || err < -MAX_ERRNO);
>> 
>> 	spin_lock(&inode->i_lock);
>> 	wb_err = ((mapping->wb_err & ~MAX_ERRNO) + (1 << 12)) | -err;
>> 	WRITE_ONCE(mapping->wb_err, wb_err);
>> 	spin_unlock(&inode->i_lock);
>> }
>> 
>
> I like this idea of being able to store arbitrary error codes there.
> That should be used judiciously of course, but we already allow
> returning arbitrary errors via the ->fsync op anyway.
>
> I'll plan to incorporate something like that into the next set (with
> judicious comments and constants).
>
> One question...is the i_lock the right way to protect this? I think we
> could do this locklessly too (cmpxchg in a loop, for instance). I'm not
> worried about performance here -- it's just nice to be able to call
> simple stuff like this without worrying about locking.

I like the idea of using cmpxchg.


>
>> int filemap_report_wb_error(struct file *file)
>> {
>> 	struct inode *inode = file_inode(file);
>> 	unsigned int wb_err = READ_ONCE(mapping->wb_err);
>> 
>> 	if (file->f_wb_err == wb_err)
>> 		return 0;
>> 	return -(wb_err & 4095);
>> }
>> 
>> That only gives us 20 bits of counter, but I think that's enough.
>
> 2^20 is 1048576, which seems a little small to me.
>
> We may end up bumping the counter on every failed I/O. How fast can we
> generate 1M failed I/Os? :)

Do we need to count all of those if no-one sees them?
i.e. use one bit to say "this error hasn't been seen".
If an error occurs with has the name error code as is currently stored,
and the bit is set, don't make a change.  Otherwise make the change,
inc the counter, set the bit.
When checking for an error, if the bit is set, clear it first.
Then you can count 500,000 errors-returned-to-some-thread, which is
probably enough.

>
> 2^52 however is 4503599627370496 (4Tios or so) ... that might take a
> little longer to overflow. Is it worth the cost here to ensure that
> this won't occur?
>
> Actually...we could put this field in the inode instead of the mapping.
> I know we've traditionally tracked this in the mapping, but is that
> required here?

What if the address_space is shared by two inodes?  That is the whole
point of the i_mapping pointer.  This would make it harder for the
"other" inode to get the error.
(Does anyone actually use fs/coda ??
 Actually, block devices use i_mapping too.
 If two block device inodes have the same major/minor number, they
 end up having i_mapping point to the same place)

If you are concerned about space in 'struct address_space', just prune
some wastage.
The "host" field brings no value.  It is only ever assigned in
inode_init_always():

	struct address_space *const mapping = &inode->i_data;
......
	mapping->host = inode;

So you could change all references to use
   container_of(mapping, struct inode, i_data)

NeilBrown



>
> If we put this field in the inode then perhaps we can union it with
> something and mitigate the cost of a larger counter...maybe in the
> i_pipe union? I don't think S_ISREG inodes use anything in there, do
> they?
> -- 
> Jeff Layton <jlayton@...hat.com>

Download attachment "signature.asc" of type "application/pgp-signature" (833 bytes)