lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 18 Aug 2011 11:19:41 +0200
From:	Christian Brunner <>
Cc:	Sage Weil <>, Theodore Tso <>,
	Fyodor Ustinov <>,
Subject: Re: Kernel 3.0.0 + ext4 + ceph == ...

I'm sorry, that I have to correct this:

The problem is still happening with 3.0.1. Although it only seems to
happen under high load now.

I also did some tracing (with 3.0.0 as the problem is easier to
reproduce here). What might be interesting to note is, that the
corruption does not occur, when I do an "strace -f cosd". (Maybe a
race condition?).

To reproduce the problem I have now setup a ceph cluster on a single machine
with replication between /ceph/osd.000 and /ceph/osd.001.

My setup now has only two active placement groups with 2 objects.

The corruption is happening, when I start replication from osd.000 to
osd.001. It is reproducible most of the time (but not allways), when I
do the following:

# mkfs.ext4 -T largefile /dev/sdb1
# mount -o noatime,user_xattr /dev/sdb1 /ceph/osd.001/
# cosd -i 001 --mkjournal --mkfs --monmap /tmp/monmap
# /usr/bin/cosd -d -i 001 -c /etc/ceph/ceph.conf

### wait until replication has finished and then stop the cosd

# umount /dev/sdb1
# fsck.ext4 -f /dev/sdb
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Inode 43, i_blocks is 8, should be 16.  Fix<y>? no

Inode 2078, i_blocks is 24, should be 16.  Fix<y>? no

I can also provide an e2image with the metadata and the strace output
of the cosd, if this would be helpful.


2011/8/8 Christian Brunner <>:
> I tried 3.0.1 today, which contains the commit Theodore suggested and
> was no longer able to reproduce the problem.
> So I think the corruption we have seen is indeed related to:
> commit 7132de744ba76930d13033061018ddd7e3e8cd91
> Author: Maxim Patlasov <>
> Date:   Sun Jul 10 19:37:48 2011 -0400
>   ext4: fix i_blocks/quota accounting when extent insertion fails
> I will now try to apply this patch to the RHEL6.1 kernel and see what
> happens...
> Thanks for your help.
> Christian
> 2011/8/3 Yehuda Sadeh Weinraub <>:
>> On Wed, Aug 3, 2011 at 7:16 AM, Christian Brunner <> wrote:
>> ...
>>> I tried to reproduce this without ceph, but wasn't able to...
>>> In the meantime it seams, that I can also see the side effects on the
>>> librbd side: I get an "librbd: data error!" when I do an "rbd copy".
>>> When I look at the librbd code this is related to a sparse_read not
>>> returning the right size of the object.
>>> I don't know if it helps, but I think that the problem is also related
>>> to sparse file usage.
>> There were a few sparse-read issues that we fixed not too long ago,
>> but should have been fixed for at least the previous ceph version. I'm
>> not sure what version you're using.
>> There was a ext4 fiemap issue that I was hitting on specific
>> environments but couldn't determine whether it was fixed in later
>> kernel versions (I was using 2.6.32). Now is a good time to try and
>> get to the bottom of it. Here's a script I was using to reproduce it:
>> #!/bin/sh
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x6f000)) count=$((0x1000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x70000)) count=$((0x1000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x71000)) count=$((0x1000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x72000)) count=$((0x1000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x73000)) count=$((0x1000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x74000)) count=$((0x2000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x2ae000)) count=$((0x2000)); sync
>> You can compile and run the following utility to dump all the extents:
>> Thanks,
>> Yehuda
>> Oh, btw, You can effectively disable the use of fiemap by setting the
>> 'filestore fiemap threshold' config option with large enough value
>> (e.g., anything bigger than 4 MB should be enough for rbd).
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to
>> More majordomo info at
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to
More majordomo info at

Powered by blists - more mailing lists