linux-kernel - Re: Btrfs: blocked for more than 120 seconds, made worse by 3.2 rc7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4EFB90F2.9030107@gmail.com>
Date:	Wed, 28 Dec 2011 23:58:10 +0200
From:	Konstantinos Skarlatos <k.skarlatos@...il.com>
To:	Dave Chinner <david@...morbit.com>
CC:	linux-kernel@...r.kernel.org,
	Linux Btrfs <linux-btrfs@...r.kernel.org>,
	Chris Mason <chris.mason@...cle.com>,
	linux-raid@...r.kernel.org
Subject: Re: Btrfs: blocked for more than 120 seconds, made worse by 3.2 rc7

On Τετάρτη, 28 Δεκέμβριος 2011 11:48:32 μμ, Dave Chinner wrote:
> On Wed, Dec 28, 2011 at 09:26:07PM +0200, Konstantinos Skarlatos wrote:
>> Hello all:
>> I have two machines with btrfs, that give me the "blocked for more
>> than 120 seconds" message. After that I cannot write anything to
>> disk, i am unable to unmount the btrfs filesystem and i can only
>> reboot with sysrq-trigger.
>>
>> It always happens when i write many files with rsync over network.
>> When i used 3.2rc6 it happened randomly on both machines after
>> 50-500gb of writes. with rc7 it happens after much less writes,
>> probably 10gb or so, but only on machine 1 for the time being.
>> machine 2 has not crashed yet after 200gb of writes and I am still
>> testing that.
>>
>> machine 1: btrfs on a 6tb sparse file, mounted as loop, on a xfs
>> filesystem that lies on a 10TB md raid5. mount options
>> compress=zlib,compress-force
>>
>> machine 2: btrfs over md raid 5 (4x2TB)=5.5TB filesystem. mount
>> options compress=zlib,compress-force
>>
>> pastebins:
>>
>> machine1:
>> 3.2rc7 http://pastebin.com/u583G7jK
>> 3.2rc6 http://pastebin.com/L12TDaXa
>
> These two are caused by it taking longer than 120s for XFS to fsync
> the loop file. Writing a signficant chunk of a sparse 6TB file on a
> software RAID5  volume is going to take some time.  However, if IO
> is not occurring, then somewhere below XFS an IO has gone missing
> (MD or hardware problem) because the fsync on the XFS file is
> blocked waiting for an IO completion.
>
>> machine2:
>> 3.2rc6 http://pastebin.com/khD0wGXx
>> 3.2rc7 (not crashed yet)
Crashed a few hours ago, here is the rc7 pastebin
http://pastebin.com/gvfUm0az 
>
> These don't have XFS in the picture, but also appear to be hung
> waiting on IO completion with MD stuck in
> make_request()->get_active_stripe(). That, to me, indicates an MD
> problem.....
>
Added the linux-raid mailing list
Please reply to me too, because i am not subscribed.

> Cheers,
>
> Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/