[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4A4C9838.7010006@redhat.com>
Date: Thu, 02 Jul 2009 07:21:28 -0400
From: Ric Wheeler <rwheeler@...hat.com>
To: Jamie Lokier <jamie@...reable.org>
CC: Michael Rubin <mrubin@...gle.com>,
Chris Worley <worleys@...il.com>,
Shaozhi Ye <yeshao@...gle.com>, linux-fsdevel@...r.kernel.org,
linux-ext4@...r.kernel.org
Subject: Re: Plans to evaluate the reliability and integrity of ext4 against
power failures.
On 07/01/2009 10:12 PM, Jamie Lokier wrote:
> Ric Wheeler wrote:
>> One way to test this with reasonable, commodity hardware would be
>> something like the following:
>>
>> (1) Get an automated power kill setup to control your server
>
> etc. Good plan.
>
> Another way to test the entire software stack, but not the physical
> disks, is to run the entire test using VMs, and simulate hard disk
> write caching and simulated power failure in the VM. KVM would be a
> great candidate for that, as it runs VMs as ordinary processes and the
> disk I/O emulation is quite easy to modify.
Certainly, that could be useful to test some level of the stack. Historically,
the biggest issues that I have run across have been focused on the volatile
write cache on the storage targets. Not only can it lose data that has been
acked all the back to the host, it can also potentially reorder that data in
challenging ways that will make file system recovery difficult....
>
> As most issues probably are software issues (kernel, filesystems, apps
> not calling fsync, or assuming barrierless O_DIRECT/O_DSYNC are
> sufficient, network fileserver protocols, etc.), it's surely worth a look.
>
> It could be much faster than the physical version too, in other words
> more complete testing of the software stack given available resources.
>
> With the ability to "fork" a running VM's state by snapshotting it and
> continuing, it would even be possible to simulate power failure cache
> loss scenarios at many points in the middle of a stress test, with the
> stress test continuing to run - no full reboot needed at every point.
> That way, maybe deliberate trace points could be placed in the
> software stack at places where power failure cache loss seems likely
> to cause a problem.
>
> -- Jamie
I do agree that this testing would also be very useful, especially so since you
can do this almost in any environment.
Regards,
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists