linux-ext4 - e2fsck running for days ?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <A1w2r2ANU6nTEfxVaC0aqzqNAaAXx90GTcISmvIbtTBL6PrL65fwaEgfzE8-_UX9WSCKm8AhNX5sEWCXNXj8vlU-c9qH31NxBdw-_5hA9lM=@protonmail.com>
Date:   Mon, 07 May 2018 13:35:58 -0400
From:   Daniel Beck <am500@...tonmail.com>
To:     "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: e2fsck running for days ?

History:

Problem started when a power failure of an enclosure kicked out Hdds
of the mdadm Linux Software Raid6.
Array: 12x8TB RAID6 , usable space about 80TB. On top is a DM Crypt
with EXT4.

Without any problems i was able to reassemble (not recreating) the array
and the raid array is clean again, verified with a resync.

>/usr/src/e2fsprogs-1.44.1/build/e2fsck/e2fsck -vv -y -C0 /dev/mapper/raid6
>e2fsck 1.44.1 (24-Mar-2018)
>ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap
>/usr/src/e2fsprogs-1.44.1/build/e2fsck/e2fsck: Group descriptors look bad... trying backup blocks...
>/dev/mapper/raid6 was not cleanly unmounted, check forced.
>Pass 1: Checking inodes, blocks, and sizes

this goes to about 15% according to the progressbar , IO Speed of 400-500MB/s

>Timestamp(s) on inode 262044870 beyond 2310-04-04 are likely pre-1970.         
>Fix? yes
>Timestamp(s) on inode 262044874 beyond 2310-04-04 are likely pre-1970.
>Fix? yes
...
>Inode 262044885 seems to contain garbage.  Clear? yes
>Inode 262044886 seems to contain garbage.  Clear? yes
>Inode 262044887 seems to contain garbage.  Clear? yes
...

Now the real time consuming:

>Inode 262043881 block 62 conflicts with critical metadata, skipping block checks.
>Inode 262043881 block 1666712512 conflicts with critical metadata, skipping block checks.
>Inode 262043881 block 1956119388 conflicts with critical metadata, skipping block checks.
...

Thousands similar lines with same inode and different blocks before it goes
on with another Inode.

>Inode 262044698 block 256 conflicts with critical metadata, skipping block checks.
>Inode 262044698 block 8192 conflicts with critical metadata, skipping block checks.
>Inode 262044698 block 100663308 conflicts with critical metadata, skipping block checks.
>Inode 262044698 block 33554432 conflicts with critical metadata, skipping block checks.
...

Takes 2-8 hours for a single inode with IO speed of about 250kb/s
RAM usage of e2fsck goes straight to 60GB! (of 64) in like 10 hours and swap usage
(total 128GB) is constantly increasing and therefore impacting the whole server.

Some details about filesystem , tunefs -l

>Filesystem magic number:  0xEF53
>Filesystem revision #:    1 (dynamic)
>Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
>Filesystem flags:         signed_directory_hash 
>Default mount options:    user_xattr acl
>Filesystem state:         clean with errors
>Errors behavior:          Continue
>Filesystem OS type:       Linux
>Inode count:              1220921344
>Block count:              19534723840
>Reserved block count:     0
>Free blocks:              1261986920
>Free inodes:              1219097129
>First block:              0
>Block size:               4096
>Fragment size:            4096

Check is now runnning for several days, is there a way to speed up things ?

Is there a estimate how many days/weeks/months this could take ?

Thanks for reading.

Regards,

Daniel