linux-kernel - Problem recovering a failed RIAD5 array with 4-drives.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 12 Jul 2007 08:49:15 -0500
From:	James <LinuxKernel@...esplace.net>
To:	linux-kernel@...r.kernel.org
Subject: Problem recovering a failed RIAD5 array with 4-drives.

My apologies if this is not the correct forum. If there is a better place to 
post this please advise.


Linux localhost.localdomain 2.6.17-1.2187_FC5 #1 Mon Sep 11 01:17:06 EDT 2006 
i686 i686 i386 GNU/Linux

(I was planning to upgrade to FC7 this weekend, but that is currently on hold 
because-)

I've got a problem with a software RIAD5 using mdadm.
Drive sdc failed causing sda to appear failed. Both drives where marked 
as 'spare'.

What follows is a record of the steps I've taken and the results. I'm looking 
for some direction/advice to get the data back. 


I've tried a few cautions things to bring the array back up with the three 
good drives with no luck. 

The last thing attempted had some limited success. I was able to get all 
drives powered up. I checked the Event count on the three good drives and 
they were all equal. So I assumed it would be safe to do the following. I 
hope I was not wrong. I issued the following commands to try to bring the 
array into a usable state.




[]# 
mdadm --create --verbose /dev/md0 --assume-clean --level=raid5 --raid-devices=4 --spare-devices=0  /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

[]# /sbin/mdadm --misc --test --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Wed Jul 11 08:03:20 2007
     Raid Level : raid5
     Array Size : 1465175808 (1397.30 GiB 1500.34 GB)
    Device Size : 488391936 (465.77 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Jul 11 08:03:47 2007
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : e46beb22:37d329db:dd16ea76:29c07a23
         Events : 0.2

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8        1        2      active sync   /dev/sda1
       3       8       49        3      active sync   /dev/sdd1
[]# mdadm --fail /dev/md0 /dev/sdc1
mdadm: set /dev/sdc1 faulty in /dev/md0

[]# /sbin/mdadm --misc --test --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Wed Jul 11 08:03:20 2007
     Raid Level : raid5
     Array Size : 1465175808 (1397.30 GiB 1500.34 GB)
    Device Size : 488391936 (465.77 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Jul 11 14:37:56 2007
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : e46beb22:37d329db:dd16ea76:29c07a23
         Events : 0.3

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
      10       0        0        0      removed
       2       8        1        2      active sync   /dev/sda1
       3       8       49        3      active sync   /dev/sdd1

       4       8       33        -      faulty spare   /dev/sdc1



[]# mount /dev/md0 /opt
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

In /var/log/messages
Jul 11 14:32:44 localhost kernel: EXT3-fs: md0: couldn't mount because of 
unsupported optional features (4000000).

[]# /sbin/fsck /dev/md0
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
fsck.ext3: Filesystem revision too high while trying to open /dev/md0
The filesystem revision is apparently too high for this version of e2fsck.
(Or the filesystem superblock is corrupt)


The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

[]# mke2fs -n /dev/md0
mke2fs 1.38 (30-Jun-2005)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
183156736 inodes, 366293952 blocks
18314697 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=369098752
11179 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 
2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848


I tried the following for all Superblock backups with the same result.

[]# e2fsck -b 214990848 /dev/md0
e2fsck 1.38 (30-Jun-2005)
/sbin/e2fsck: Invalid argument while trying to open /dev/md0

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>


Any advice/direction would be appreciated. 
Thanks much.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/