linux-ext4 - Re: Possible ext4 corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Sat, 14 Mar 2009 00:58:29 +1030
From:	Kevin Shanahan <kmshanah@...b.org.au>
To:	Theodore Tso <tytso@....edu>
Cc:	Andreas Dilger <adilger@....com>,
	Eric Sandeen <sandeen@...hat.com>, linux-ext4@...r.kernel.org
Subject: Re: Possible ext4 corruption - ACL related?

On Thu, 2009-03-12 at 20:55 -0400, Theodore Tso wrote:
> > Inode	Pathname
> > 864	/local/apps/Gestalt.Net/SetupCD/program files/Business Objects/Common/3.5/bin/Cdo32pl.dll
> > 875	/local/apps/Gestalt.Net/SetupCD/program files/Business Objects/Common/3.5/bin/RptControllers.dll
> 
> Well, it's likely those files are corrupted, so you might as well
> delete them and restore from backup if needed/appropriate/possible.

Hrm, deleting these files resulted in:

  EXT4-fs error (device dm-0): ext4_xattr_delete_inode: inode 875: block 118279104364544 read error
  Aborting journal on device dm-0:8.
  Remounting filesystem read-only

Unmounting resulted in:

  EXT4-fs error (device dm-0) in ext4_free_inode: Journal has aborted
  EXT4-fs: mballoc: 6695972 blocks 38497 reqs (34673 success)
  EXT4-fs: mballoc: 4434 extents scanned, 37231 goal hits, 946 2^N hits, 0 breaks, 0 lost
  EXT4-fs: mballoc: 750 generated and it took 6082776
  EXT4-fs: mballoc: 4997239 preallocated, 120774 discarded
  ext4_abort called.
  EXT4-fs error (device dm-0): ext4_put_super: Couldn't clean up the journal

hermes:~# e2fsck -pfv /dev/dm-0
/dev/dm-0: recovering journal
/dev/dm-0: Group descriptor 768 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 769 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 770 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 771 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 772 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 773 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 774 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 775 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 776 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 777 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 778 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 779 checksum is invalid.  FIXED.
/dev/dm-0: Note: if several inode or block bitmap blocks or part
of the inode table require relocation, you may wish to try
running e2fsck with the '-b 32768' option first.  The problem
may lie only with the primary block group descriptors, and
the backup block group descriptors may be OK.

/dev/dm-0: Inode bitmap for group 780 is not in group.  (block 339410944)


/dev/dm-0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
        (i.e., without -a or -p options)

hermes:~# e2fsck -pfv -b 32768 /dev/dm-0
/dev/dm-0: Group descriptor 0 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 1 checksum is invalid.  FIXED.
/dev/dm-0: Group descriptor 2 checksum is invalid.  FIXED.
....
 (snip messages for every group descriptor in sequence)
....
/dev/dm-0: Group descriptor 8191 checksum is invalid.  FIXED.

  289080 inodes used (0.43%)
   11818 non-contiguous inodes (4.1%)
         # of inodes with ind/dind/tind blocks: 0/0/0
         Extent depth histogram: 288728/287
40783849 blocks used (15.19%)
       0 bad blocks
       2 large files

  263218 regular files
   25794 directories
       0 character device files
       0 block device files
       1 fifo
       0 links
      58 symbolic links (54 fast symbolic links)
       0 sockets
--------
  289071 files

I guess there was some additional badness that e2fsck hadn't picked up
the on previous runs. And dammit, there's still something wrong.
Attempting to copy the backup files back in, the cp command is hanging
and I got another one of these:

attempt to access beyond end of device
dm-0: rw=0, want=946232834916360, limit=2147483648

Okay, I guess the hang has something to do with the stray file which
turned itself into a named pipe. Trying to remove it again:

hermes:~# rm '/srv/samba/local/apps/Gestalt.Net/SetupCD/program files/Business Objects/Common/3.5/bin/RptControllers.dll'

And I get:

Mar 14 00:48:34 hermes kernel: attempt to access beyond end of device
Mar 14 00:48:34 hermes kernel: dm-0: rw=0, want=946232834916360, limit=2147483648
Mar 14 00:48:34 hermes kernel: EXT4-fs error (device dm-0): ext4_xattr_delete_inode: inode 875: block 118279104364544 read error
Mar 14 00:48:34 hermes kernel: Aborting journal on device dm-0:8.
Mar 14 00:48:34 hermes kernel: Remounting filesystem read-only
Mar 14 00:48:34 hermes kernel: EXT4-fs error (device dm-0) in ext4_free_inode: Journal has aborted

Where is this request for block 118279104364544 coming from?

hermes:~# debugfs /dev/dm-0
debugfs 1.41.3 (12-Oct-2008)
debugfs:  stat <875>

Inode: 875   Type: FIFO    Mode:  0611   Flags: 0xb3a9c185
Generation: 3690868    Version: 0x00000000:9d36b10d
User: 27453   Group: 58480   Size: 0
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x49ba6b3a:5a58878c -- Sat Mar 14 00:48:34 2009
 atime: 0x472a2311:00000000 -- Fri Nov  2 05:33:45 2007
 mtime: 0x80c59881:ffffffff -- Fri Jun 18 09:51:21 2038
crtime: 0x49a6c1d1:5f76c580 -- Fri Feb 27 02:52:41 2009
dtime: 0x49ba6b3a -- Sat Mar 14 00:48:34 2009
Size of extra inode fields: 28
BLOCKS:

Does having this reliable way to reproduce the bug help?

Cheers,
Kevin.


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html