linux-ext4 - [Bug 12821] filesystem corrupts on heavy I/O

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20090313150345.7BA4B108042@picon.linux-foundation.org>
Date:	Fri, 13 Mar 2009 08:03:45 -0700 (PDT)
From:	bugme-daemon@...zilla.kernel.org
To:	linux-ext4@...r.kernel.org
Subject: [Bug 12821] filesystem corrupts on heavy I/O

http://bugzilla.kernel.org/show_bug.cgi?id=12821





------- Comment #15 from sandeen@...hat.com  2009-03-13 08:03 -------
(In reply to comment #14)
> I'm seeing a similar error with heavy read/write I/O on a 1TB ext4 volume. It's
> not clear what behavior triggers the error for me. Occasionally I see the error
> in dmesg...
> 
> [ 7829.004269] EXT4-fs error (device sdb1): ext4_ext_search_right: bad header
> in inode #2491097: unexpected eh_depth - magic f30a, entries 78, max 340(0),
> depth 1(2)
> [ 7829.012197] mpage_da_map_blocks block allocation failed for inode 2491097 at
> logical offset 2788227 with max blocks 7 with error -5
> [ 7829.012220] This should not happen.!! Data will be lost
> 
> ...and sync does not complete. It's not clear from the discussion if e2image
> needs to be done before the error occurs on a clean mount, or afterward while
> the system is up.

If you like, it's probably sufficient to point debugfs at the system and do

debugfs> stat <2491097>

to give us an idea of the layout of that file.

> When this happens the reboot procedure does not complete, and I power cycle the
> machine. 

the fs may have gone readonly after the error, but I'm not sure it should have
hung up the system.

> After a power cycle, the journal replays and fsck completes, no
> inconsistencies. The files affected by data loss are part of a BitTorrent
> network download and after a Torrent data consistency check, I confirm that
> data has been lost. If I leave the Torrent active downloading for more than an
> hour or so, the ext4 errors occur.

Great, if you can reproduce it w/ bittorrent, can you please try the patch in
the attachments?

Thanks,
-Eric


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html