linux-ext4 - Re: tune2fs can't be used on a mounted ext4, or...?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20110412005526.3015a238@natsu>
Date:	Tue, 12 Apr 2011 00:55:26 +0600
From:	Roman Mamedov <rm@...anrm.ru>
To:	Ted Ts'o <tytso@....edu>
Cc:	Andreas Dilger <adilger@...ger.ca>, linux-ext4@...r.kernel.org,
	linux-raid@...r.kernel.org
Subject: Re: tune2fs can't be used on a mounted ext4, or...?

On Mon, 11 Apr 2011 09:10:08 -0400
Ted Ts'o <tytso@....edu> wrote:

> Your symptoms don't sound familiar to me, other than the standard
> concerns about hardware induced file system inconsistency problems.

Thing is, I do not observe any in-file random data corruptions which would
point to a problem at a lower (block-device) level, so I do not think it is a
RAID or HDD problem.

The breakage seemed to be on the filesystem logic level, perhaps something to
do with allocation of space for new files? And since I immediately just before
that, made two operations possibly affecting it (tune2fs stride size + online
grow with resize2fs) that's why I thought this might be an ext4 problem.

While still in the same session, I then re-copied the affected files replacing
their "shortened" copies, and they were written out fine the second time. And
after a reboot, no more file truncations are observed so far.

> Have you checked your logs carefully to make sure there weren't any
> hardware errors reported?

No, there weren't any errors in dmesg, or on the same console where 'cp' would
output its errors.

> If this is a hardware RAID system, is it  regularly doing disk scrubbing?
> Has the hardware RAID reported anything unusual?  How long have you been
> running in a degraded RAID 6 state?

It is an mdadm RAID6, and it does not report any problem. It was running in a
degraded state for only a short time (less than a day). And AFAIK running
degraded without one disk is not a dangerous or risky situation with RAID6.

> And have you tried shutting down the system and running fsck to make
> sure there weren't any file system corruption problems?  When's the
> last time you've run fsck on the system?

I have unmounted it and ran fsck just now. Admittedly there was a long time
since the last fsck.

# e2fsck /dev/md0
e2fsck 1.41.12 (17-May-2010)
/dev/md0 has gone 306 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md0: 367107/364412928 files (4.3% non-contiguous), 1219229259/1457626752
blocks

> If this is an LVM system, I'd strongly suggest that you set aside
> space you can take a snapshot, and then regularly take a snapshot, and
> then run fsck on the snapshot.  If any problems are noted, you can
> then schedule downtime and fsck the entire system.

No, I don't use LVM there.

-- 
With respect,
Roman

Download attachment "signature.asc" of type "application/pgp-signature" (199 bytes)