[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFZ0FUXK89LLOmRGuXWW1UAZo0ydTi_O=383E+nsCywQgYh6eQ@mail.gmail.com>
Date: Sun, 22 Jan 2012 11:31:31 +0800
From: Robin Dong <hao.bigrat@...il.com>
To: Amir Goldstein <amir73il@...il.com>
Cc: Theodore Tso <tytso@....edu>, Tao Ma <taoma.tm@...il.com>,
coly <colyli@...il.com>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Yongqiang Yang <xiaoqiangnk@...il.com>
Subject: Re: Question about writable ext4-snapshot
2012/1/22 Amir Goldstein <amir73il@...il.com>:
> On Sat, Jan 21, 2012 at 6:24 AM, Theodore Tso <tytso@....edu> wrote:
>>
>> On Jan 20, 2012, at 9:45 PM, Robin Dong wrote:
>>
>>> Hello, Amir
>>>
>>> I am evaluating ext4-snapshot (on github) for TAOBAO recently. The
>>> snapshot of an ext4 fs is READONLY now, but we do need to write data
>>> into snapshot.
>>> We also want using ext4-snapshot to do online-fsck on
>>> Hadoop clusters, but our hadoop clusters are using no-journal ext4
>>> now. So we have some question
>>>
>>> 1. Will it be possible to implement a writable ext4-snapshot ?
>>> 2. Will it be possible to snapshot a no-journal ext4-fs ?
>>> 3. What's the difficult point of implementing above ?
>>
>
> Hello Robin,
>
> 1. writable snapshots (snapshot clones) are actually quite simple to implement
> (a sparse file containing all changes from a read-only snapshot).
> The real challenge is how to support snapshots of these clones and how to
> implement the space reclaim efficiently (time wise) when deleting snapshots.
> indeed, LVM thin-provisioning target handles space reclaim very efficiently.
>
> 2. I think it is possible, but I never looked into it, so there may
> be challenges that I haven't foreseen.
> The obvious culprit is that snapshots will not be reliable after crash.
> JBD ensures that metadata is not overwritten on-disk before it is
> copied to snapshot,
> but without journal, after a crash, meta data could have already been
> written and you loose
> the origin data that was supposed to be copied to snapshot.
>
> 3. I think I have already answered that question above, but the actual
> difficulty
> really depends on your specific needs.
>
>> Something else to consider is that the device mapper thin-provisioning approach. This approach does the snapshotting at the device-mapper layer, which means it is separate from the file system. It relies on using the discard request when the file is unlinked to know when blocks can be released from the snapshot. It also uses a granularity much smaller than that of the traditional LVM-style snapshots.
>>
>> This code will still need a few months to be mature (the thin-provisioning code just got merged into 3.2, but discard support isn't done yet, and the userspace support is lagging). But in the long run, this might be a very attractive way of providing multiple levels of writeable snapshots, in a clean and relatively simple way.
>>
>
> There are some lengthy threads about LVM thinp vs. Ext4 snapshots here:
> http://thread.gmane.org/gmane.comp.file-systems.ext4/25968/focus=26056
> and here:
> http://thread.gmane.org/gmane.comp.file-systems.ext4/26041
>
> At the end of the day, thinp target is a very powerful tool, but is
> does not fit all
> use cases. In particular, it fragments the on-disk layout of ext4 metadata and
> benchmark results for how this affect performance were never published.
>
> Also, thinp needs to store quite a lot of metadata for the mapping of
> all thinp blocks
> and in order to keep this metadata durable and not hurt write speed performance
> you will almost certainly need to store this metadata on an SSD - not
> a bad solution
> for a high end server, but not sure if everyone can afford this.
>
> Amir.
Thanks for all your suggestion!
I will evaluate thin-provision and ext4-snapshot both later.
--
--
Best Regard
Robin Dong
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists