[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 12 Dec 2011 03:27:08 +0100
From: Andrea Arcangeli <aarcange@...hat.com>
To: linux-ext4@...r.kernel.org
Cc: Theodore Tso <tytso@....EDU>, Jan Kara <jack@...e.cz>
Subject: [PATCH 2/2] ext4: ext4_discard_partial_page_buffers_no_lock() wrong parameters
If "copied" is zero (it can happen if the pte is unmapped before the
atomic copy_user that does the data copy runs) the "from" passed to
ext4_discard_partial_page_buffers_no_lock() points to pos-1, which
would correspond to a logical page index before the page->index
leading to ext4_discard_partial_page_buffers_no_lock() returning
-EINVAL (because index != page->index). In such a case write() returns
-EINVAL and userland gets a failure and filemap.c doesn't retry the
copy_user anymore.
I'm not certain of why exactly
ext4_discard_partial_page_buffers_no_lock() is run here, so it's hard
to tell if this is the correct fix. But it that functions clears data
starting from the "from" parameter, so regardless of the -EINVAL
retval, the right "from" to start clearing data should be pos+copied,
not pos+copied-1. If this assumption is correct, it could mean that
this bug in addition to the -EINVAL error, could also zero out 1 byte
by mistake. I'm not sure what the implications for that are (not sure
if data corruption is possible in some circumstances because of
that). I guess normally this functions runs on unmapped buffers and
the EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED makes it a noop on those.
After fixing the hang in ext4_da_should_update_i_disksize, this write
-EINVAL error becomes trivially reproducible with experimental knumad
autonuma code running at heavy frequency (not the normal case). But
like for the ext4_da_should_update_i_disksize hang, it should not be
impossible to reproduce it with legacy swapping behavior.
After dropping all caches a md5sum is successful and I can't find
errors anymore with this patch, the -EINVAL stops, but it's not
conclusive (and I haven't run e2fsck -f yet but I doubt this affects
metadata coherency, seems more like a delayalloc data issue).
Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
---
fs/ext4/inode.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 63f9541..528c4c5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2534,11 +2534,11 @@ static int ext4_da_write_end(struct file *file,
page, fsdata);
page_len = PAGE_CACHE_SIZE -
- ((pos + copied - 1) & (PAGE_CACHE_SIZE - 1));
+ ((pos + copied) & (PAGE_CACHE_SIZE - 1));
- if (page_len > 0) {
+ if (page_len < PAGE_CACHE_SIZE) {
ret = ext4_discard_partial_page_buffers_no_lock(handle,
- inode, page, pos + copied - 1, page_len,
+ inode, page, pos + copied, page_len,
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists