linux-ext4 - Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 3 Apr 2017 19:45:21 -0700
From:   Matthew Wilcox <willy@...radead.org>
To:     Jeff Layton <jlayton@...hat.com>
Cc:     NeilBrown <neilb@...e.com>, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org,
        akpm@...ux-foundation.org, tytso@....edu, jack@...e.cz
Subject: Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking
 infrastructure and convert ext4 to use it

On Mon, Apr 03, 2017 at 04:16:17PM -0400, Jeff Layton wrote:
> On Mon, 2017-04-03 at 12:16 -0700, Matthew Wilcox wrote:
> > int filemap_report_wb_error(struct file *file)
> > {
> > 	struct inode *inode = file_inode(file);
> > 	struct address_space *mapping = file->f_mapping;
> > 	int err;
> > 
> > 	spin_lock(&inode->i_lock);
> > 	if (file->f_wb_err == mapping->wb_err) {
> > 		err = 0;
> > 	} else if (mapping->wb_err & 1) {
> > 		filp->f_wb_err = mapping->wb_err & ~2;
> > 		err = -EIO;
> > 	} else {
> > 		filp->f_wb_err = mapping->wb_err;
> > 		err = -ENOSPC;
> > 	}
> > 	spin_unlock(&inode->i_lock);
> > 	return err;
> > }
> > 
> > If I got that right, calling fsync() on an inode which has experienced
> > both errors would first get an EIO.  Calling fsync() on it again would
> > get an ENOSPC.  Calling fsync() on it a third time would get 0.  When
> > either error occurs again, the thread will go back through the cycle
> > (EIO -> ENOSPC -> 0).
> > 
> 
> I don't think so? mapping->wb_err would still have 0x1 set after the
> first call so you'd always end up in the first else if branch.
> 
> It's getting toward beer 30 here though so I could be misreading it.

Well, yes, of course you misread it.  You read what I actually wrote
instead of what I intended to write.  Silly Jeff ...

int filemap_report_wb_error(struct file *file)
{
	struct inode *inode = file_inode(file);
	struct address_space *mapping = file->f_mapping;
	int err;

	spin_lock(&inode->i_lock);
	if (file->f_wb_err == mapping->wb_err) {
		err = 0;
	} else if ((mapping->wb_err ^ file->f_wb_err) == 2) {
		filp->f_wb_err = mapping->wb_err;
		err = -ENOSPC;
	} else {
		filp->f_wb_err = mapping->wb_err & ~2;
		err = -EIO;
	}
	spin_unlock(&inode->i_lock);
	return err;
}

The read side is easier in terms of atomic ...

int filemap_report_wb_error(struct file *file)
{
	unsigned int wb_err = atomic_read(&file->f_mapping->wb_err)

	if (file->f_wb_err == wb_err)
		return 0;
	if ((file->f_wb_err ^ wb_err) == 2) {
		filp->f_wb_err = wb_err;
		return -ENOSPC;
	} else {
		filp->f_wb_err = wb_err & ~2;
		return -EIO;
	}
}

but doing the write side with an atomic looks incredibly painful.  Since
we don't actually need to make the write side scalable, I'd rather see the
write side continue to use a spinlock and do the read side this way:

int filemap_report_wb_error(struct file *file)
{
	unsigned int wb_err = READ_ONCE(file->f_mapping->wb_err)

	if (file->f_wb_err == wb_err)
		return 0;
	if ((file->f_wb_err ^ wb_err) == 2) {
		filp->f_wb_err = wb_err;
		return -ENOSPC;
	} else {
		filp->f_wb_err = wb_err & ~2;
		return -EIO;
	}
}