linux-kernel - Re: [PATCH 0/3] mm/fs: get PG_error out of the writeback reporting business

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1488724854.2925.6.camel@redhat.com>
Date:   Sun, 05 Mar 2017 09:40:54 -0500
From:   Jeff Layton <jlayton@...hat.com>
To:     viro@...iv.linux.org.uk, konishi.ryusuke@....ntt.co.jp
Cc:     linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-nilfs@...r.kernel.org,
        NeilBrown <neilb@...e.com>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>
Subject: Re: [PATCH 0/3] mm/fs: get PG_error out of the writeback reporting
 business

On Sun, 2017-03-05 at 08:35 -0500, Jeff Layton wrote:
> I recently did some work to wire up -ENOSPC handling in ceph, and found
> I could get back -EIO errors in some cases when I should have instead
> gotten -ENOSPC. The problem was that the ceph writeback code would set
> PG_error on a writeback error, and that error would clobber the mapping
> error.
> 

I should also note that relying on PG_error to report writeback errors
is inherently unreliable as well. If someone calls sync() before your
fsync gets in there, then you'll likely lose it anyway.

filemap_fdatawait_keep_errors will preserve the error in the mapping,
but not the individual PG_error flags, so I think we do want to ensure
that the mapping error is set when there is a writeback error and not
rely on PG_error bit for that.

> While I fixed that problem by simply not setting that bit on errors,
> that led me down a rabbit hole of looking at how PG_error is being
> handled in the kernel.
> 
> This patch series is a few fixes for things that I 100% noticed by
> inspection. I don't have a great way to test these since they involve
> error handling. I can certainly doctor up a kernel to inject errors
> in this code and test by hand however if these look plausible up front.
> 
> Jeff Layton (3):
>   nilfs2: set the mapping error when calling SetPageError on writeback
>   mm: don't TestClearPageError in __filemap_fdatawait_range
>   mm: set mapping error when launder_pages fails
> 
>  fs/nilfs2/segment.c |  1 +
>  mm/filemap.c        | 19 ++++---------------
>  mm/truncate.c       |  6 +++++-
>  3 files changed, 10 insertions(+), 16 deletions(-)
> 

(cc'ing Ross...)

Just when I thought that only NILFS2 needed a little work here, I see
another spot...

I think that we should also need to fix dax_writeback_mapping_range to
set a mapping error on writeback as well. It looks like that's not
happening today. Something like the patch below (obviously untested).

I'll also plan to follow up with a patch to vfs.txt to outline how
writeback errors should be handled by filesystems, assuming that this
patchset isn't completely off base.

-------------------8<-----------------------

[PATCH] dax: set error in mapping when writeback fails

In order to get proper error codes from fsync, we must set an error in
the mapping range when writeback fails.

Signed-off-by: Jeff Layton <jlayton@...hat.com>
---
 fs/dax.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index c45598b912e1..9005d90deeda 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -888,8 +888,10 @@ int dax_writeback_mapping_range(struct address_space *mapping,

 			ret = dax_writeback_one(bdev, mapping, indices[i],
 					pvec.pages[i]);
-			if (ret < 0)
+			if (ret < 0) {
+				mapping_set_error(mapping, ret);
 				return ret;
+			}
 		}
 	}
 	return 0;
-- 
2.9.3