[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080523203429.11846.84137.stgit@localhost.localdomain>
Date: Fri, 23 May 2008 22:34:29 +0200
From: Guillaume Chazarain <guichaz@...il.com>
To: akpm@...ux-foundation.org
Cc: linux-kernel@...r.kernel.org
Subject: [PATCH] fs-writeback: handle errors in sync_sb_inodes()
Currently it is possible for some errors to be detected at write-back
time but not reported to the program as shown by the following script
using the included make_file.c. Undetected ENOSPC or EIO can obviously lead to
data corruption.
---------8<---------8<---------8<---------8<---------8<---------8<---------
#!/bin/bash
# We binary search the size of a file in a 40M filesystem that can cause
# the missed error.
MIN=5000000
MAX=50000000
rm fs.40M
dd if=/dev/zero of=fs.40M bs=40M count=0 seek=1 status=noxfer
#mkfs.ext2 -F fs.40M
mkfs.ext3 -F fs.40M
#mkfs.jfs -q fs.40M
#mkfs.reiserfs -fq fs.40M
#mkfs.xfs fs.40M # SIGBUS in memset() with patch
# XFS => OK for ENOSPC, but not for EIO
attempt()
{
SIZE=$1
RES=0
mount fs.40M /mnt -o loop
if ! ./make_file /mnt/not_enough_space $SIZE; then
# We could not create the file as the requested size
# was clearly too big
echo "/mnt/not_enough_space: not enough space for $SIZE bytes"
RES=1
fi
umount /mnt
if [ $RES -eq 0 ]; then
# We think we could create the file, let's see if that's true
mount fs.40M /mnt -o loop
cmp <(tr '\000' '\377' < /dev/zero | head -c$SIZE) \
/mnt/not_enough_space
RES=$?
umount /mnt
if [ $RES -eq 0 ]; then
# The file was small enough to fit in the filesystem
RES=-1
else
echo "Undetected ENOSPC with SIZE=$SIZE"
exit
fi
fi
return $RES
}
while [ $((MAX - MIN)) -gt 1 ]; do
SIZE=$(((MIN + MAX) / 2))
attempt $SIZE
RES=$?
if [ $RES -eq 1 ]; then
MAX=$SIZE
else
MIN=$SIZE
fi
done
echo "Could not reproduce the problem"
---------8<---------8<---------8<---------8<---------8<---------8<---------
/* make_file.c */
#include <unistd.h>
#include <sys/fcntl.h>
#include <sys/mman.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
int size, fd;
char *mapping;
if (argc != 3) {
fprintf(stderr, "Usage: %s FILE SIZE\n", argv[0]);
return 1;
}
size = atoi(argv[2]);
fd = open(argv[1], O_RDWR | O_CREAT, 0600);
if (fd < 0) {
perror(argv[1]);
return 1;
}
if (ftruncate(fd, size) < 0) {
perror("ftruncate");
return 1;
}
mapping = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0);
if (mapping == MAP_FAILED) {
perror("mmap");
return 1;
}
memset(mapping, 0xFF, size);
/* Force a write-back */
sync();
if (msync(mapping, size, MS_SYNC) < 0) {
perror("msync");
return 1;
}
if (close(fd) < 0) {
perror("close");
return 1;
}
printf("%s: successfully written %d bytes\n", argv[1], size);
return 0;
}
---------8<---------8<---------8<---------8<---------8<---------8<---------
make_file.c mmaps a hole, performs some writeback (memset + sync) and
then expects to find some error code in msync(). The script mounts a
40M loopback filesystem and does a binary search to find the size of
a file big enough to provoke a ENOSPC, but small enough to show the
error not being detected at msync() time.
The error window is large enough for such a size to be quickly found, but with
this patch, no such file size can be found.
All mmap capable filesystems I tested are affected (ext2, ext3,
jfs, reiserfs, xfs). XFS is special in that it survives the test thanks
to the page_mkwrite() work, i.e. it SIGBUS during memset. Anyway, this XFS
behaviour solves ENOSPC but does nothing for EIO.
The offending code is in fs/fs-writeback.c:
sync_sb_inodes(...)
{
...
__writeback_single_inode(inode, wbc);
...
}
__writeback_single_inode() gets the error from mapping->flags, clears it
and returns it. But sync_sb_inodes() ignores this return value. sync_sb_inodes()
should leave the error in each mapping instead of propagating it through the
call stack for the following reasons:
- the propagated error would combine the errors in all the synced inodes, so it
would just tell that some inode in a specific fs got an error,
- nobody in the call stack would be interested by this error: certainly not
pdflush, or 'void sync(2)'.
Signed-off-by: Guillaume Chazarain <guichaz@...il.com>
---
fs/fs-writeback.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 495214d..0e6881e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -454,6 +454,7 @@ void generic_sync_sb_inodes(struct super_block *sb,
struct address_space *mapping = inode->i_mapping;
struct backing_dev_info *bdi = mapping->backing_dev_info;
long pages_skipped;
+ int err;
if (!bdi_cap_writeback_dirty(bdi)) {
redirty_tail(inode);
@@ -498,7 +499,8 @@ void generic_sync_sb_inodes(struct super_block *sb,
BUG_ON(inode->i_state & I_FREEING);
__iget(inode);
pages_skipped = wbc->pages_skipped;
- __writeback_single_inode(inode, wbc);
+ err = __writeback_single_inode(inode, wbc);
+ mapping_set_error(mapping, err);
if (wbc->sync_mode == WB_SYNC_HOLD) {
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &sb->s_dirty);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists