lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140805131555.GB18164@fishbowl.rw.madduck.net>
Date:	Tue, 5 Aug 2014 15:15:55 +0200
From:	martin f krafft <madduck@...duck.net>
To:	Theodore Ts'o <tytso@....edu>,
	linux kernel mailing list <linux-kernel@...r.kernel.org>
Subject: Re: EXT4-fs error, kernel BUG

also sprach Theodore Ts'o <tytso@....edu> [2014-08-05 14:51 +0200]:
> One likely cause of this issue is that the hardware hiccuped on
> a read, and returned garbage, which is what triggered the "EXT4-fs
> error" message (which is really a report of a detect file system
> inconsistency).  A common cause of this is the block address
> getting corrupted, so that the hard drive read the correct data
> from the wrong location.

This sounds like it would happen every time and fsck would catch it.

> The other likely cause is that you are using something like RAID1,
> and the one of copies of the disk block really is corrupted, and
> the kernel read the bad version of the block, but fsck managed to
> read the good version of the block.

it's a RAID10 (using md), so this is a good shot, actually. Which is
bad news for me, because RAID corruption is not nice — when you have
two clocks, you won't know what time it is anymore…

Fortunately, I now managed to tar the filesystem content to
elsewhere without error, so in theory all I have to do now is
recreate it. And I'll recreate the filesystem while we're at it.
That should teach RAID10 again…

I'd still like to drill down to the memory problem…

> It's possible that this was caused by a memory corruption, but it
> wouldn't have been high on my suspect list.  Still, if this is
> a new machine, it might not be a bad idea to run memtest86+ for
> 24-48 hours.

… and will do that. I did it before, but I also just upgraded the
RAM and didn't do it again.

Thank you, tytso. Hope to see you at DC14…

-- 
@martinkrafft | http://madduck.net/ | http://two.sentenc.es/
 
"not the truth in whose possession any man is, or thinks he is, but
 the honest effort he has made to find out the truth, is what
 constitutes the worth of man."
                                                   -- gotthold lessing
 
spamtraps: madduck.bogus@...duck.net

Download attachment "digital_signature_gpg.asc" of type "application/pgp-signature" (1108 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ