[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <508AEEF7.8060301@onlinehome.de>
Date: Fri, 26 Oct 2012 22:13:43 +0200
From: Martin <marogge@...inehome.de>
To: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
CC: Nix <nix@...eri.org.uk>, linux-ext4@...r.kernel.org, tytso@....edu,
stable@...r.kernel.org, gregkh@...uxfoundation.org
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3
(and other stable branches?)
On 10/24/2012 07:38 PM, Martin wrote:
> On 10/24/2012 01:40 AM, Nix wrote:
>
>> It's true that in less than a week
>> probably not all that many people have rebooted often enough to trip
>> over this.
>>
>> I hope.
>>
>
> [previous bug report]
First off let me apologize for not having the right follow-up headers,
but I am not subscribed and I read the list behind an NNTP gateway.
I have studied my corruption problem more closely and can give you a
description of what happened below. Would you say this may be the same bug?
Thx and regards,
Martin
------------------ snip ---------------------------
Storyboard for my root filesystem crash (source: system logs and memory)
========================================================================
Oct 13 07:48:15
Computer is booted. Computer is then suspended and resumed a few times.
Oct 15 18:43:19
Final resume event before the issue starts. At some point prior to the
next timestamp the computer freezes. Probably just the video hardware
becoming unresponsive, but the teenage user does not know about ssh or
alt-sysreq and decides to turn the killswitch.
He then remembers he is supposed to do a clean shutdown at all times and
boots the computer again in order to perform a clean shutdown:
Oct 15 19:04:20
Computer is booted for the sole reason to be shut down again.
Oct 15 19:05:15
Computer halts. Nothing unusual in the system logs.
Oct 15 19:56:12
Computer is booted again in order to copy a few files to memory stick.
Unbeknownst to me, the following entries are logged in the system log:
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5):
add_dirent_to_buf:1587: inode #655361: block 2629945: comm mount: bad
entry in directory: rec_len % 4 != 0 - offset=360(360), inode=655682,
rec_len=18, name_len=5
Oct 15 20:00:16 harold kernel: Aborting journal on device sda5-8.
Oct 15 20:00:16 harold kernel: EXT4-fs (sda5): Remounting filesystem
read-only
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5) in
ext4_evict_inode:238: Journal has aborted
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5) in
ext4_create:2120: IO failure
Oct 15 20:00:16 harold hp-systray: hp-systray[1594]: warning: No hp: or
hpfax: devices found in any installed CUPS queue. Exiting.
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5):
search_dirblock:1098: inode #655361: block 2629945: comm
dbus-daemon-lau: bad entry in directory: rec_len % 4 != 0 -
offset=360(8552), inode=655682, rec_len=18, name_len=5
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5):
search_dirblock:1098: inode #655361: block 2629945: comm
dbus-daemon-lau: bad entry in directory: rec_len % 4 != 0 -
offset=360(8552), inode=655682, rec_len=18, name_len=5
Oct 15 20:01:06 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'video' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] No Caching mode page
present
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] Assuming drive cache:
write through
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] No Caching mode page
present
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] Assuming drive cache:
write through
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] No Caching mode page
present
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] Assuming drive cache:
write through
Oct 15 20:01:19 harold udisksd[1710]: Mounted /dev/sdc1 at
/run/media/jan/INTENSO on behalf of uid 1002
Oct 15 20:01:21 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm pool: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:01:59 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'video' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'video' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'video' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:04:54 harold udisksd[1710]: Mounted /dev/sr0 at
/run/media/jan/Sixth Sense on behalf of uid 1002
Oct 15 20:05:31 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm bash: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:05:38 harold last message repeated 16 times
Oct 15 20:06:00 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm ls: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:06:43 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm ls: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:06:59 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm mc: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:06:59 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm mc: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:07:13 harold kernel: nf_conntrack: automatic helper assignment
is deprecated and it will be removed soon. Use the iptables CT target to
attach helpers instead.
Since the memory stick couldn't be accessed I had decided at this point
to look into /etc/groups which is a common cause for this. The file or
directory entry was missing. Further messages while unmounting:
Oct 15 20:09:25 harold kernel: usb 1-1: >USB disconnect, device number 2
Oct 15 20:09:25 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'video' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:09:26 harold udisksd[1710]: Cleaning up mount point
/run/media/jan/INTENSO (device 8:33 no longer exist)
Oct 15 20:09:41 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'video' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:09:41 harold kernel: NVRM: GPU at 0000:02:00:
GPU-918d6d1b-c9fb-6232-601c-99ea6ba768a7
Oct 15 20:09:41 harold acpid: client 1512[0:0] has disconnected
Oct 15 20:09:41 harold acpid: client 1512[0:0] has disconnected
Oct 15 20:09:41 harold acpid: client connected from 2241[0:0]
Oct 15 20:09:41 harold acpid: 1 client rule loaded
Oct 15 20:09:42 harold acpid: client connected from 2241[0:0]
Oct 15 20:09:42 harold acpid: 1 client rule loaded
Oct 15 20:09:42 harold kdm_greet[2245]: Cannot load
/usr/share/apps/kdm/faces/.default.face: Datei oder Verzeichnis nicht
gefunden
Oct 15 20:09:44 harold acpid: client 2241[0:0] has disconnected
Oct 15 20:09:47 harold acpid: client 2241[0:0] has disconnected
At this point I went into runlevel 1 to troubleshoot, but then decided
to reboot and do an fsck from an initrd environment.
Oct 15 20:09:55 harold init: Switching to runlevel: 1
Oct 15 20:09:55 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'video' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:09:58 harold squid: getgrnam failed to find groupid for
effective group 'nogroup'
Oct 15 20:09:58 harold squid: getgrnam failed to find groupid for
effective group 'nogroup'
Oct 15 20:09:58 harold (squid-1): getgrnam failed to find groupid for
effective group 'nogroup'
Oct 15 20:09:58 harold acpid: exiting
Oct 15 20:09:58 harold syslogd 1.5.0: restart.
Oct 15 20:09:58 harold kernel: Kernel logging (proc) stopped.
Oct 15 20:09:58 harold kernel: Kernel log daemon terminating.
Oct 15 20:09:58 harold sshd[1307]: Received SIGHUP; restarting.
Oct 15 20:09:58 harold ntpd[1314]: ntpd exiting on signal 1
Oct 15 20:09:58 harold dbus[1333]: [system] Reloaded configuration
Oct 15 20:09:58 harold sshd[2313]: Server listening on 0.0.0.0 port 22.
Oct 15 20:10:04 harold exiting on signal 15
The fsck fixed some issues and testified a clean state.
The next reboot did not even get logged in the system logs, and logon
was refused.
I have now looked at the filesystem again from an initrd environment.
Result: lost+found contains 577 files, 90 of which are directories. All
of them seem to be from /etc. I will try to rename them to their proper
name on another machine, and restore them on the target machine.
However, due to the sheer number this might take forever. Also I am
worried the problem might re-surface, as it has neither been identified
nor fixed.
Full system logs are available on request.
NB: kernel was v3.5.5 with CK1 and BFQ patches, tainted by nvidia
module. The rootfs was mounted as type ext4 with mount options
discard,noatime,nodiratime,errors=remount-ro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists