lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20070731015300.GM31489@sgi.com>
Date:	Tue, 31 Jul 2007 11:53:00 +1000
From:	David Chinner <dgc@....com>
To:	Ryan Bair <ryandbair@...il.com>
Cc:	linux-kernel@...r.kernel.org, xfs@....sgi.com
Subject: Re: Repeated XFS corruption -Corruption of in-memory data detected

[cc xfs@....sgi.com]

On Mon, Jul 30, 2007 at 12:10:52PM -0400, Ryan Bair wrote:
> Kernel: 2.6.18-4-amd64 (Debian 2.6.18.dfsg.1-12etch2) Debian Etch
> System: Dell PowerEdge 1850
> Processor: 3.2 GHz Intel Xeon w/ microcode v1.14a, Hyperthreading disabled.
> RAM: 2x1GB ECC DDR-400
> RAID Controller: Dell PERC5/E using megaraid driver
> 
> I got another unexpected error on my XFS partition today. I was able
> to reboot the system normally and the journal recovered on the
> following mount. Shortly thereafter, the error occurred again. After
> this the filesystem was no longer able to be mounted as the error
> would occur immediately.
> 
> The volume is on a 9.5TB LVM2 volume on a Dell MD1000 loaded with 15
> 750GB drives in a RAID5 set. Writeback is disabled. Memtest86+ was run
> on this system for 48 hours without fault. The system is otherwise
> stable.

<sigh>

You're the second person today to report a software RAID5+XFS corruption on
the 2.6.18-4 Debian kernel. Almost the same signature as well - that is a
corrupted free space btree.

> XFS was able to repair the damage, but previously the drive returned
> to its corrupted state within a few hours of heavy I/O.

The other report was a shutdown before corruption got to disk,
so maybe they are different problems.

Can you post the repair output so we can see what the damage was?
Also, can you post your md/dm config so I can see if I can recreate
a similar config?

Also, seeing as the previous report was caught before corruption
got to disk, I suspected memory corruption of some kind. Can
you enable slab, vm and filesystem debugging for you kernel and
run with that?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ