lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090821105156.GA9996@gradator.net>
Date:	Fri, 21 Aug 2009 12:51:56 +0200
From:	Sylvain Rochet <gradator@...dator.net>
To:	Simon Kirby <sim@...tway.ca>
Cc:	Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
	linux-ext4@...r.kernel.org, linux-nfs@...r.kernel.org,
	Al Viro <viro@...iv.linux.org.uk>,
	Sylvain Rochet <gradator@...dator.net>
Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

Hi,


On Thu, Aug 20, 2009 at 05:00:35PM -0700, Simon Kirby wrote:
> On Thu, Aug 20, 2009 at 07:19:53PM +0200, Sylvain Rochet wrote:
> 
> > So, everything is fine, but the problem happened only one time on this 
> > server, so we cannot conclude anything after a few weeks. However, 
> > I now have physical access back, so we will switch back to the former 
> > server where the problem happened quite frequently, then we will see!
> 
> Not to derail the thread, but you were definitely seeing the same issues
> with stock 2.6.30.4, right?

Nope, the last issue we had came from 2.6.28.9.

We upgraded to 2.6.30.3 on the advice of Jan, then we "upgraded" to 
2.6.30.3 with the first Jan's patch to add some debug output 
(0001-ext3-Debug-unlinking-of-inodes.patch). Finally we upgraded to 
2.6.30.4 with the first and the second Jan's patch 
(0001-fs-Make-sure-data-stored-into-inode-is-properly-see.patch) to add 
a smp_mb() in the unlock_new_inode() function.


> We had all sorts of corruption happening for files served via NFS with 
> 2.6.28 and 2.6.29, but everything was magically fixed on 2.6.30 
> (though we needed a lot of fscking).  I never did track down what 
> change fixed it, since it took a while to reproduce.

Same here, everything is fine since 2.6.30. We will switch back to the 
quad-core server where the corruption happen(ed) in a few days. We are 
now using a bi-opteron server because we suspected hardware issues on 
the quad-core, the corruption happened only one time on the bi-opteron 
(which is IMHO a sufficient evidence to discard hardware issue). I guess 
the issue was(or is) kinda SMP related.

And yep, we also had long times playing with fsck ;-) Luckily that the 
corruption only occurs on new files, and new files are mostly caches, 
sessions, logs, and such, so fsck used its chainsaw on quite 
not-really-important files.


> Hmm.  I just noticed what seems to be a new occurrence of "deleted inode
> referenced" on a box with 2.6.30.  We saw many when we first upgraded to
> 2.6.30 due to the corruption caused by 2.6.29, but those all occurred
> within a day or so and were fsck'd.  I would have thought the backup
> sweeps would have tripped over that inode way before now...
> 
> Just wondering if you can confirm that the errors you saw with 2.6.30.4
> were not leftover from older kernels.

The few garbaged inodes from 2.6.28.9 (and previous) were pushed to 
lost+found to prevent future use of them. We do a fsck when we moved to 
2.6.30.4 that fixed everything. We never had corruption yet with the 
2.6.30.4.


Sylvain

Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ