[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130926191404.GB21811@quack.suse.cz>
Date: Thu, 26 Sep 2013 21:14:04 +0200
From: Jan Kara <jack@...e.cz>
To: James Dingwall <james.dingwall@...stra.com>
Cc: linux-ext4@...r.kernel.org
Subject: Re: ext[234] data corruption (Linux 3.8, 3.9 / Xen)
Hello,
On Thu 26-09-13 08:22:40, James Dingwall wrote:
> >Hi,
> >
> >We have observed a data corruption bug in a database created by
> >the postmap command (BDB file) under the following conditions:
> >
> >Xen domU guest kernel 3.8, 3.9 (3.5, 3.10, 3.11 don't show the
> >behaviour 3.6 and 3.7 are unknown)
> >dom0 Xen 4.2.1 / kernel 3.8 or Xen 4.3.0 / kernel 3.11
> >The guest has a passed through block device (phy:/ or file:/)
> >The filesytem on the passed through device is ext2/3/4 with a 1k
> >block size
Thanks for report! So have you really tried with all three filesystems?
And don't you have EXT4_USE_FOR_EXT23 set by any chance? There were some
changes to ext4 writeback path and extent status tree. So for ext4 I could
understand the problem got introduced and fixed. But ext2/3 didn't see any
significant changes for a long time...
> >By examining a strace of the postmap command we produced a short
> >piece of code (at the bottom) which demonstrates the problem. If
> >this is executed in a loop such as:
> >
> >#!/bin/bash
> >for i in $(seq 1 5) ; do
> > mount /dev/xvde1 /mnt
> > pushd /mnt> /dev/null
> > echo "checksums after mount"
> > md5sum testcase.bin
> > [ "${i}" = "1" ] && ./a.out
> > echo "checksums before umount"
> > md5sum testcase.bin
> > popd> /dev/null
> > umount /mnt
> >done
I'll see if I can reproduce this to investigate.
> >The output is
> >
> >checksums after mount
> >md5sum: testcase.bin: No such file or directory
> >checksums before umount
> >719f20c98b69457ce0247d6bf4474cf9 testcase.bin# the correct
> >checksum for the file
> >checksums after mount
> >a90804e64bcc1c0c98dd2cb23d0e4c10 testcase.bin
> >checksums before umount
> >a90804e64bcc1c0c98dd2cb23d0e4c10 testcase.bin
> >checksums after mount
> >14bb035eca1ec516ce3865700536fc0c testcase.bin
> >checksums before umount
> >14bb035eca1ec516ce3865700536fc0c testcase.bin
> >checksums after mount
> >124d3d3ea8e421925825ff94a815630b testcase.bin
> >checksums before umount
> >124d3d3ea8e421925825ff94a815630b testcase.bin
> >checksums after mount
> >7c05f36ffdd6b8217a27c0bd4d9cb531 testcase.bin
> >checksums before umount
> >7c05f36ffdd6b8217a27c0bd4d9cb531 testcase.bin
> >
> >If we dd out the block device and then loop mount the resulting
> >file we do not see this problem suggesting that communication
> >between xen block back/front is ok and that it is only when the
> >mount takes place that there is a problem. The default libdb
> >behaviour seems to be to create a database with a block size
> >matching that of the filesystem, if we override this and set it at
> >4k we do not see this issue. This is also observed by changing
> >the bs value in our test program. Once bs is > 3072 we no longer
> >observe the problem. Also we can avoid the issue in our test
> >program by filling in hole while __testcase.bin is being
> >generated. A similar test on xfs with a 1k block size did not
> >demonstrate this problem. If make a cp of the file before the
> >umount then the copied version is and remains correct.
> >
> >Our searching does not seem to have revealed any similar reports
> >or an explicitly identified fix that was introduced for 3.10. Our
> >concern therefore is that this is an unrecognised failure that has
> >been inadvertently fixed and could equally inadvertently be
> >reintroduced by some other change. If this problem sounds
> >familiar or there are suggestions on how to narrow this down
> >further we would greatly appreciate the advice.
Well, you can always use 'git bisect' to find the commit that fixed this.
Honza
> >#include <string.h>
> >#include <stdio.h>
> >#include <fcntl.h>
> >#include <stdlib.h>
> >#include <sys/stat.h>
> >
> >extern
> >int main(int argc, char *argv[])
> >{
> > struct stat *sbuf;
> > char *buf, *zero, *null;
> > int fd5, fd6, fd7;
> > int i;
> > int bs = 1024; /* lte 3072 = corruption */
> >
> >
> > buf = malloc(3*bs);
> > zero = malloc(3*bs);
> > null = malloc(bs);
> > memset(zero, 0, 3*bs);
> > sbuf = malloc(sizeof(struct stat));
> > memset(sbuf, 0, sizeof(struct stat));
> >
> > for(i = 0; i < 3*bs; i++) {
> > buf[i] = i & 0x000f;
> > }
> >
> > fd5 = open("__testcase.bin", O_RDWR|O_CREAT|O_EXCL, 0644);
> > //fcntl(fd5, F_GETFD);
> > //fcntl(fd5, F_SETFD, FD_CLOEXEC);
> > //stat("__testcase.bin", sbuf);
> > fstat(fd5, sbuf);
> > /* this only writes the first and last blocks */
> > lseek(fd5, 0*bs, SEEK_SET);
> > write(fd5, zero, bs);
> > //lseek(fd5, 1*bs, SEEK_SET); /* filling in this hole is a fix! */
> > //write(fd5, zero, bs);
> > lseek(fd5, 2*bs, SEEK_SET);
> > write(fd5, zero, bs);
> > fdatasync(fd5);
> > rename("__testcase.bin", "testcase.bin");
> >
> > //stat("testcase.bin", sbuf);
> > fd6 = open("testcase.bin", O_RDWR|O_CREAT, 0);
> > //fcntl(fd6, F_GETFD);
> > //fcntl(fd6, F_SETFD, FD_CLOEXEC);
> > //fstat(fd6, sbuf);
> > pread(fd6, null, bs, 0);
> > //fstat(fd6, sbuf);
> > //fcntl(fd6, F_GETFD);
> > //fcntl(fd6, F_SETFD, FD_CLOEXEC);
> > //fcntl(fd6, F_GETFD);
> > //fcntl(fd6, F_SETFD, FD_CLOEXEC);
> > fd7 = open("testcase.bin", O_RDWR);
> > flock(fd7, LOCK_EX);
> > umask(022);
> > pread(fd6, null, bs, 1*bs);
> > pread(fd6, null, bs, 2*bs);
> > pwrite(fd6, buf, bs, 0*bs);
> > pwrite(fd6, buf, bs, 1*bs);
> > pwrite(fd6, buf, bs, 2*bs);
> > fdatasync(fd6);
> > fdatasync(fd6);
> > close(fd5);
> > close(fd6);
> >
> > fd5 = open("testcase.bin", O_RDWR, 0);
> > //fcntl(fd5, F_GETFD);
> > //fcntl(fd5, F_SETFD, FD_CLOEXEC);
> > fdatasync(fd5);
> > close(fd5);
> >
> > close(fd7);
> >
> > free(buf);
> > free(sbuf);
> > free(zero);
> > free(null);
> >}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists