lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20130509091223.11334.qmail@science.horizon.com>
Date:	9 May 2013 05:12:23 -0400
From:	"George Spelvin" <linux@...izon.com>
To:	linux-ext4@...r.kernel.org
Cc:	linux@...izon.com
Subject: Sporadic metadata_csum errors

This is a "does this ring any bells" report, not yet a formal Bug Report.
As I said, I'm not 100% sure of the hardware, and metadata_csum may just
be catching previously invisible corruption.

One computer I've been running metata_csum on keeps bombing out
every couple of weeks with a kernel error like

[178447.345677] EXT4-fs error (device sda2): ext4_iget:4192: inode #892311: comm udisks-helper-a: checksum invalid
[178447.345682] Aborting journal on device sda2-8.
[178447.345891] EXT4-fs (sda2): Remounting filesystem read-only
[178447.346158] EXT4-fs error (device sda2): ext4_iget:4192: inode #892311: comm udisks-helper-a: checksum invalid
[180246.010747] EXT4-fs error (device sda2): ext4_iget:4192: inode #892311: comm udisks-helper-a: checksum invalid
[180246.011846] EXT4-fs error (device sda2): ext4_iget:4192: inode #892311: comm udisks-helper-a: checksum invalid

# debugfs -n /dev/sda2
debugfs 1.43-WIP (22-Sep-2012)
debugfs:  stat <892311>
Inode: 892311   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 444873085    Version: 0x00000000:00000001
User:     0   Group:     0   Size: 322
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x502c7f65:37e9cca0 -- Thu Aug 16 01:04:37 2012
 atime: 0x5189802e:431ad914 -- Tue May  7 18:29:02 2013
 mtime: 0x5027d0d3:00000000 -- Sun Aug 12 11:50:43 2012
crtime: 0x502c7f06:c76d7c10 -- Thu Aug 16 01:03:02 2012
Size of extra inode fields: 28
Inode checksum: 0x9fe97671
EXTENTS:
(0):7475957

However, without -n, I get "stat: Inode checksum does not match inode
while reading inode 892311".  I can read 892310 and 892312.

A few notes:
* Processor is i3-530, with SSE 4.2 (CRC32)
* The hardware has been pretty good to me for about a year, but I'm
  not 100% sure of it.  I used to be annoyed that the RAM inexplicably
  wasn't stable at 1600 MHz, then after figuring out that it could run
  mprime (prime95) for 24 hours at 1530 MHz, I discovered that it was
  specified for 1333.  :-(  I haven't re-run that stability test lately.
* This has happened at least four times so far.  It's always the root file        system, and not /home on /dev/sda3.  Even though they're configured
  almost identically.  I have e2fsck logs from the last two (and this one,
  as soon as I fix it).
* Each time, e2fsck finds a couple of corrupted inodes and no other damage.
* This latest is with 3.9 + the ext4/dev tree, which fixed a metadata_csum
  bug.  I held off reporting the others because there was a known bugfix I
  didn't have.
* The file (/etc/udev/udev.conf in this case) appears uncorrupted when
  esamined with debigfs -n.  (Mut mi doesn't fix the checksum :-(.)
* ctime and mtime are both very old (although atime is only about three
  hiurs before the first erroe, despite relatime).
* Is there an existing tool to analyze an inode and look for single-bit
  errors?

This time, unlike other times, the inode that reported the error did
NOT show a checksum error after reboot:

Script started on Wed May  8 13:18:09 2013
# e2fsck -v /dev/sda2
e2fsck 1.43-WIP (22-Sep-2012)
root contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found.  Fix<y>? yes
Inode 3176 was part of the orphaned inode list.  FIXED.
Inode 474898 was part of the orphaned inode list.  FIXED.
Inode 578439 was part of the orphaned inode list.  FIXED.
Inode 587654 was part of the orphaned inode list.  FIXED.
Inode 588111 was part of the orphaned inode list.  FIXED.
Deleted inode 630260 has zero dtime.  Fix<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(3517280--3517298) -(5091968--5092017) -(5868334--5868477) -(5890927--5890943) -(7769088--7769208) -(8725504--8726308)
Fix<y>? yes
Free blocks count wrong for group #107 (32749, counted=32768).
Fix<y>? yes
Free blocks count wrong for group #155 (27000, counted=27050).
Fix<y>? yes
Free blocks count wrong for group #179 (21646, counted=21807).
Fix<y>? yes
Free blocks count wrong for group #237 (27958, counted=28079).
Fix<y>? yes
Free blocks count wrong for group #266 (19765, counted=20570).
Fix<y>? yes
Free blocks count wrong (5796540, counted=5797696).
Fix<y>? yes
Inode bitmap differences:  -3176 -474898 -578439 -587654 -588111 -630260
Fix<y>? yes
Free inodes count wrong for group #0 (14, counted=15).
Fix<y>? yes
Free inodes count wrong for group #144 (0, counted=1).
Fix<y>? yes
Free inodes count wrong for group #176 (1, counted=2).
Fix<y>? yes
Free inodes count wrong for group #179 (379, counted=381).
Fix<y>? yes
Free inodes count wrong for group #192 (239, counted=240).
Fix<y>? yes
Free inodes count wrong (684282, counted=684288).
Fix<y>? yes

root: ***** FILE SYSTEM WAS MODIFIED *****
root: ***** REBOOT LINUX *****

      296432 inodes used (30.23%, out of 980720)
         183 non-contiguous files (0.1%)
         291 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 266628/113
     3967815 blocks used (40.63%, out of 9765511)
           0 bad blocks
           0 large files

      242897 regular files
       21904 directories
         164 character device files
          10 block device files
           1 fifo
          36 links
       31435 symbolic links (29496 fast symbolic links)
          12 sockets
------------
      296459 files
# ls -l /etc/udev
total 12
-rw-r--r-- 1 root root  281 Jun  6  2010 links.conf
drwxr-xr-x 2 root root 4096 Mar 30  2012 rules.d
-rw-r--r-- 1 root root  322 Aug 12  2012 udev.conf
columbia[503]# cat /etc/udev/udev.conf
# The initial syslog(3) priority: "err", "info", "debug" or its
# numerical equivalent. For runtime debugging, the daemons internal
# state can be changed with: "udevadm control --log-priority=<value>".
#
# udevd is started in the initramfs, so when this file is modified the
# initramfs should be rebuilt.
udev_log="err"
columbia[504]# debugfs /dev/sda2
debugfs 1.43-WIP (22-Sep-2012)
debugfs:  cd /etc/udev
debugfs:  ls
 896276  (12) .    845603  (12) ..    892311  (20) udev.conf   
 896278  (12) .dev    896279  (16) rules.d    896282  (4012) links.conf   
debugfs:  stat udev.conf
Inode: 892311   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 444873085    Version: 0x00000000:00000001
User:     0   Group:     0   Size: 322
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x502c7f65:37e9cca0 -- Thu Aug 16 01:04:37 2012
 atime: 0x5189802e:431ad914 -- Tue May  7 18:29:02 2013
 mtime: 0x5027d0d3:00000000 -- Sun Aug 12 11:50:43 2012
crtime: 0x502c7f06:c76d7c10 -- Thu Aug 16 01:03:02 2012
Size of extra inode fields: 28
Inode checksum: 0x9ed4b13c
EXTENTS:
(0):7475957
debugfs:  columbia[505]# exit

Script done on Wed May  8 13:19:35 2013

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ