lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <AF87FDA9-09D0-421D-9459-94310206B4EB@sun.com>
Date:	Fri, 22 Jan 2010 12:06:11 -0700
From:	Andreas Dilger <adilger@....com>
To:	Ric Wheeler <rwheeler@...hat.com>
Cc:	Eric Sandeen <sandeen@...hat.com>, tytso@....edu,
	ext4 development <linux-ext4@...r.kernel.org>,
	Bill Nottingham <notting@...hat.com>
Subject: Re: [PATCH] default max mount count to unused

On 2010-01-22, at 11:57, Ric Wheeler wrote:
> On 01/22/2010 01:40 PM, Andreas Dilger wrote:
>>> Reboot time:
>>> (1) Try to mount the file system
>>> (1) on mount failure, fsck the failed file system
>>
>> Well, this is essentially what already happens with e2fsck today,  
>> though
>> it correctly checks the filesystem for errors _first_, and _then_  
>> mounts
>> the filesystem. Otherwise it isn't possible to fix the filesystem  
>> after
>> mount, and mounting a filesystem with errors is a recipe for further
>> corruption and/or a crash/reboot cycle.
>
> I think that we have to move towards an assumption that our  
> journalling code actually works - the goal should be that we can  
> *always* mount after a crash or clean reboot. That should be the  
> basic test case - pound on a file system, drop power to the storage  
> (and or server) and then on reboot, try to remount. Verification  
> would be in the QA test case to unmount and fsck to make sure our  
> journal was robust.

I think you are missing an important fact here.  While e2fsck _always_  
runs on a filesystem at boot time (or at least this is the recommended  
configuration), this initial e2fsck run is only doing a very minimal  
amount of work (i.e. it is NOT a full "e2fsck -f" run).  It checks  
that the superblock is sane, it recovers the journal, and it looks for  
error flags written to the journal and/or superblock.  If all of those  
tests pass (i.e. less than a second of work) then the e2fsck run  
passes (excluding periodic checking, which IMHO is the only issue  
under discussion here).

> Note that in a technique that I have used in the past (with  
> reiserfs) at large scale in actual deployments of hundreds of  
> thousands of file systems. It does work pretty well in practice.
>
> The key here is that any fsck can be a huge delay, pretty much  
> unacceptable in production shops, where they might have multiple  
> file systems per box.

No, there is no delay if the filesystem does not have any errors.  I  
consider the lack of ANY minimal boot-time sanity checking a serious  
problem with reiserfs and advised Hans many times to have minimal  
sanity checks at boot.

The problem is that if the kernel (or a background snapshot e2fsck)  
detects an error then the only way it can force a full check to  
correct is to do this on the next boot, by storing some information in  
the superblock.  If the filesystem is mounted at boot time without  
even a minimal check for such error flags in the superblock then the  
error may never be corrected, and in fact may cause cascading  
corruption elsewhere in the filesystem (e.g. corrupt bitmaps, bad  
indirect block pointers, etc).

If the filesystem is mounted with "errors=panic" (often done with HA  
configs to allow failover in case of node/cable/HBA errors) then any  
corruption on disk will just result in the node getting stuck in a  
reboot cycle with no automated e2fsck being run.

>>> While up and running, do a periodic check with the snapshot trick.
>>
>> Yes, this is intended to reset the periodic mount/time counter to  
>> avoid
>> the non-error boot-time check. If that is not running correctly  
>> then the
>> periodic check would still be done as a fail-safe measure.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>>
>


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ