[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090716172749.GC3740@atrey.karlin.mff.cuni.cz>
Date: Thu, 16 Jul 2009 19:27:49 +0200
From: Jan Kara <jack@...e.cz>
To: Sylvain Rochet <gradator@...dator.net>
Cc: linux-kernel@...r.kernel.org
Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption
Hi,
> We(TuxFamily) are having some inodes corruptions on a NFS server.
>
> So, let's start with the facts.
>
>
> ==== NFS Server
>
> Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
Can you still see the corruption with 2.6.30 kernel?
...
> /dev/md10 on /data type ext3 (rw,noatime,nodiratime,grpquota,commit=5,data=ordered)
>
> ==> We used data=writeback, we fallback to data=ordered,
> problem's still here
>
...
>
> # df -m
> /dev/md10 1378166 87170 1290997 7% /data
1.3 TB, a large filesystem ;).
> # df -i
> /dev/md10 179224576 3454822 175769754 2% /data
>
>
>
> ==== NFS Clients
>
> 6x Linux cognac 2.6.28.9-grsec #1 SMP Sun Apr 12 13:06:49 CEST 2009 i686 GNU/Linux
> 5x Linux martini 2.6.28.9-grsec #1 SMP Tue Apr 14 00:01:30 UTC 2009 i686 GNU/Linux
> 2x Linux armagnac 2.6.28.9 #1 SMP Tue Apr 14 08:59:12 CEST 2009 i686 GNU/Linux
>
> x.x.x.x:/data/... on /data/... type nfs (rw,noexec,nosuid,nodev,async,hard,nfsvers=3,udp,intr,rsize=32768,wsize=32768,timeo=20,addr=x.x.x.x)
>
> ==> All NFS exports are mounted this way, sometimes with the 'sync'
> option, like web sessions.
> ==> Those are often mounted from outside of chroots into chroots,
> useless detail I think
...
> ==== So, now, going into the problem
>
> The kernel log is not really nice with us, here on the NFS Server:
>
> Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:16 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:16 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> And so on...
If you can still see this problem, could you run: debugfs /dev/md10
and send output of the command:
stat <40420228>
(or whatever the corrupted inode number will be)
and also:
dump <40420228> /tmp/corrupted_dir
> And more recently...
> Apr 2 22:19:01 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40780223), 0
> Apr 2 22:19:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40491685), 0
> Apr 11 07:23:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (174301379), 0
> Apr 20 08:13:32 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (54942021), 0
>
>
> Not much stuff in the kernel log of NFS clients, history is quite lost,
> but we got some of them:
>
> ....................: NFS: Buggy server - nlink == 0!
>
>
> == Going deeper into the problem
>
> Something like that is quite common:
>
> root@...ooka:/data/...# ls -la
> total xxx
> drwxrwx--- 2 xx xx 4096 2009-04-20 03:48 .
> drwxr-xr-x 7 root root 4096 2007-01-21 13:15 ..
> -rw-r--r-- 1 root root 0 2009-04-20 03:48 access.log
> -rw-r--r-- 1 root root 70784145 2009-04-20 00:11 access.log.0
> -rw-r--r-- 1 root root 6347007 2009-04-10 00:07 access.log.10.gz
> -rw-r--r-- 1 root root 6866097 2009-04-09 00:08 access.log.11.gz
> -rw-r--r-- 1 root root 6410119 2009-04-08 00:07 access.log.12.gz
> -rw-r--r-- 1 root root 6488274 2009-04-07 00:08 access.log.13.gz
> ?--------- ? ? ? ? ? access.log.14.gz
> ?--------- ? ? ? ? ? access.log.15.gz
> ?--------- ? ? ? ? ? access.log.16.gz
> ?--------- ? ? ? ? ? access.log.17.gz
> -rw-r--r-- 1 root root 6950626 2009-04-02 00:07 access.log.18.gz
> ?--------- ? ? ? ? ? access.log.19.gz
> -rw-r--r-- 1 root root 6635884 2009-04-19 00:11 access.log.1.gz
> ?--------- ? ? ? ? ? access.log.20.gz
> ?--------- ? ? ? ? ? access.log.21.gz
> ?--------- ? ? ? ? ? access.log.22.gz
> ?--------- ? ? ? ? ? access.log.23.gz
> ?--------- ? ? ? ? ? access.log.24.gz
> ?--------- ? ? ? ? ? access.log.25.gz
> ?--------- ? ? ? ? ? access.log.26.gz
> -rw-r--r-- 1 root root 6616546 2009-03-24 00:07 access.log.27.gz
> ?--------- ? ? ? ? ? access.log.28.gz
> ?--------- ? ? ? ? ? access.log.29.gz
> -rw-r--r-- 1 root root 6671875 2009-04-18 00:12 access.log.2.gz
> ?--------- ? ? ? ? ? access.log.30.gz
> -rw-r--r-- 1 root root 6347518 2009-04-17 00:10 access.log.3.gz
> -rw-r--r-- 1 root root 6569714 2009-04-16 00:12 access.log.4.gz
> -rw-r--r-- 1 root root 7170750 2009-04-15 00:11 access.log.5.gz
> -rw-r--r-- 1 root root 6676518 2009-04-14 00:12 access.log.6.gz
> -rw-r--r-- 1 root root 6167458 2009-04-13 00:11 access.log.7.gz
> -rw-r--r-- 1 root root 5856576 2009-04-12 00:10 access.log.8.gz
> -rw-r--r-- 1 root root 6644142 2009-04-11 00:07 access.log.9.gz
>
>
> root@...ooka:/data/...# cat * # output filtered, only errors
> cat: access.log.14.gz: Stale NFS file handle
> cat: access.log.15.gz: Stale NFS file handle
> cat: access.log.16.gz: Stale NFS file handle
> cat: access.log.17.gz: Stale NFS file handle
> cat: access.log.19.gz: Stale NFS file handle
> cat: access.log.20.gz: Stale NFS file handle
> cat: access.log.21.gz: Stale NFS file handle
> cat: access.log.22.gz: Stale NFS file handle
> cat: access.log.23.gz: Stale NFS file handle
> cat: access.log.24.gz: Stale NFS file handle
> cat: access.log.25.gz: Stale NFS file handle
> cat: access.log.26.gz: Stale NFS file handle
> cat: access.log.28.gz: Stale NFS file handle
> cat: access.log.29.gz: Stale NFS file handle
> cat: access.log.30.gz: Stale NFS file handle
>
>
> "Stale NFS file handle"... on the NFS Server... hummm...
>
>
> == Other facts
>
> fsck.ext3 fixed the filesystem but didn't fix the problem.
>
> mkfs.ext3 didn't fix the problem either.
You might want to try disabling the DIR_INDEX feature and see whether
the corruption still occurs...
> It only concerns files which have been recently modified, logs, awstats
> hashfiles, websites caches, sessions, locks, and such.
>
> It mainly happens to files which are created on the NFS server itself,
> but it's not a hard rule.
>
> Keeping inodes into servers' cache seems to prevent the problem to happen.
> ( yeah, # while true ; do ionice -c3 find /data -size +0 > /dev/null ; done )
I'd guess just because they don't have to be read from disk where they
get corrupted.
> Hummm, it seems to concern files which are quite near to each others,
> let's check that:
>
> Let's build up an inode "database"
>
> # find /data -printf '%i %p\n' > /root/inodesnumbers
>
>
> Let's check how inodes numbers are distributed:
>
> # cat /root/inodesnumbers | perl -e 'use Data::Dumper; my @pof; while(<>){my ( $inode ) = ( $_ =~ /^(\d+)/ ); my $hop = int($inode/1000000); $pof[$hop]++; }; for (0 .. $#pof) { print $_." = ".($pof[$_]/10000)."%\n" }'
> [... lot of quite unused inodes groups]
> 53 = 3.0371%
> 54 = 26.679% <= mailboxes
> 55 = 2.7026%
> [... lot of quite unused inodes groups]
> 58 = 1.3262%
> 59 = 27.3211% <= mailing lists archives
> 60 = 5.5159%
> [... lot of quite unused inodes groups]
> 171 = 0.0631%
> 172 = 0.1063%
> 173 = 27.2895% <=
> 174 = 44.0623% <=
> 175 = 45.6783% <= websites files
> 176 = 45.8247% <=
> 177 = 36.9376% <=
> 178 = 6.3294%
> 179 = 0.0442%
>
> Hummm, all the files are using the same inodes "groups".
> (groups of a million of inodes)
Interesting, but it may well be just by the way how these files get
created / updated.
> We use to fix broken folders by moving them to a quarantine folder and
> by restoring disappeared files from the backup.
>
> So, let's check corrupted inodes number from the quarantine folder:
>
> root@...ooka:/data/path/to/rep/of/quarantine/folders# find . -mindepth 1 -maxdepth 1 -printf '%i\n' | sort -n
> 174293418
> 174506030
> 174506056
> 174506073
> 174506081
> 174506733
> 174507694
> 174507708
> 174507888
> 174507985
> 174508077
> 174508083
> 176473056
> 176473062
> 176473064
>
> Humm... those are quite near to each other 17450... 17647... and are of
> course in the most used inodes "groups"...
>
>
> Open question: are NFS clients can steal inodes numbers from each others ?
>
>
> I am not sure whether my bug report is good, feel free to ask questions ;)
Honza
--
Jan Kara <jack@...e.cz>
SuSE CR Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists