lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090728164142.GA13662@gradator.net>
Date:	Tue, 28 Jul 2009 18:41:42 +0200
From:	Sylvain Rochet <gradator@...dator.net>
To:	Jan Kara <jack@...e.cz>
Cc:	linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org,
	linux-nfs@...r.kernel.org
Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

Hi,


On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > > 
> > > > > Can you still see the corruption with 2.6.30 kernel?
> > > > 
> > > > Not upgraded yet, we'll give a try.
> > 
> > Done, now featuring 2.6.30.3 ;)
> 
> OK, drop me an email if you will see corruption also with this kernel.

Lets move out the corrupted directory ;)

root@...ooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok 
rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
root@...ooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
root@...ooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops


> > > This is probably the misleading output from ext3_iget(). It should give
> > > you EIO in the latest kernel.
> > 
> > root@...ooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
> > cat: spip%3Farticle19.f8740dca: Input/output error
> > 
> > It has much more sense now. We thought the problem was around NFS due 
> > the the previous error message, actually this is probably not the best 
> > looking path.
> 
> Yes, EIO makes more sence. I think the problem is NFS connected anyway
> though :). But I don't have a clue how it can happen yet. Maybe I can try
> adding some low-cost debugging checks if you'd be willing to run such
> kernel...

Without any problem, we have 24/7/365 physical access and we don't need 
to provide high-availability services.

Anyway, the data hosted aren't that important, there is little or even 
no need for strict confidentiality, so we will be happy to provide ssh 
access to whom would like to look deeper into this issue.


> I'm adding to CC linux-nfs just in case someone has an idea.
> 
> > >   Ah, OK, here's the problem. The directory points to a file which is
> > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > to indicate that the file was correctly deleted (you might check that the
> > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> > 
> > root@...ooka:~# debugfs /dev/md10
> > debugfs 1.40-WIP (14-Nov-2006)
> > debugfs:  icheck 88541562
> > Block   Inode number
> > 88541562        <block not found>
> 
> Ah, wrong debugfs command. I should have written:
> testi <88541562>

debugfs:  testi <88541562>
Inode 88541562 is not in use


> > >   The question is how it could happen the directory still points to the
> > > inode. Really strange. It looks as if we've lost a write to the directory
> > > but I don't see how. Are there any suspitious kernel messages in this case?
> > 
> > There were nothing for a while, but since the reboot there are some 
> > about this inode: 
> > 
> > EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
> 
> Yes, that's to be expected given the corruption any NFS error messages?

There are some error messages on NFS clients, however they are quite old.

Apr 19 15:38:21 gin kernel: NFS: Buggy server - nlink == 0!
May  3 20:00:52 gin kernel: NFS: Buggy server - nlink == 0!
May  3 23:24:03 gin kernel: NFS: Buggy server - nlink == 0!
May  7 11:40:57 gin kernel: NFS: Buggy server - nlink == 0!
May  7 14:41:02 gin kernel: NFS: Buggy server - nlink == 0!
May 26 11:10:42 cognac kernel: NFS: Buggy server - nlink == 0!
May 26 11:13:28 cognac kernel: NFS: Buggy server - nlink == 0!
May 26 12:34:39 cognac kernel: NFS: Buggy server - nlink == 0!
May 26 12:39:43 cognac kernel: NFS: Buggy server - nlink == 0!

This is obviously related to the corruption.



Sylvain

Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ