lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 21 Jan 2012 18:09:49 +0200
From:	Amir Goldstein <amir73il@...il.com>
To:	Robin Dong <hao.bigrat@...il.com>
Cc:	Theodore Tso <tytso@....edu>, Tao Ma <taoma.tm@...il.com>,
	coly <colyli@...il.com>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>,
	Yongqiang Yang <xiaoqiangnk@...il.com>
Subject: Re: Question about writable ext4-snapshot

On Sat, Jan 21, 2012 at 6:24 AM, Theodore Tso <tytso@....edu> wrote:
>
> On Jan 20, 2012, at 9:45 PM, Robin Dong wrote:
>
>> Hello, Amir
>>
>> I am evaluating ext4-snapshot (on github) for TAOBAO recently. The
>> snapshot of an ext4 fs is READONLY now, but we do need to write data
>> into snapshot.
>> We also want using  ext4-snapshot to do online-fsck on
>> Hadoop clusters, but our hadoop clusters are using no-journal ext4
>> now. So we have some question
>>
>> 1. Will it be possible to implement a writable ext4-snapshot ?
>> 2. Will it be possible to snapshot a no-journal ext4-fs ?
>> 3. What's the difficult point of  implementing above ?
>

Hello Robin,

1. writable snapshots (snapshot clones) are actually quite simple to implement
(a sparse file containing all changes from a read-only snapshot).
The real challenge is how to support snapshots of these clones and how to
implement the space reclaim efficiently (time wise) when deleting snapshots.
indeed, LVM thin-provisioning target handles space reclaim very efficiently.

2. I think it is possible, but I never looked into it, so there may
be challenges that I haven't foreseen.
The obvious culprit is that snapshots will not be reliable after crash.
JBD ensures that metadata is not overwritten on-disk before it is
copied to snapshot,
but without journal, after a crash, meta data could have already been
written and you loose
the origin data that was supposed to be copied to snapshot.

3. I think I have already answered that question above, but the actual
difficulty
really depends on your specific needs.

> Something else to consider is that the device mapper thin-provisioning approach.   This approach does the snapshotting at the device-mapper layer, which means it is separate from the file system.  It relies on using the discard request when the file is unlinked to know when blocks can be released from the snapshot.  It also uses a granularity much smaller than that of the traditional LVM-style snapshots.
>
> This code will still need a few months to be mature (the thin-provisioning code just got merged into 3.2, but discard support isn't done yet, and the userspace support is lagging).   But in the long run, this might be a very attractive way of providing multiple levels of writeable snapshots, in a clean and relatively simple way.
>

There are some lengthy threads about LVM thinp vs. Ext4 snapshots here:
http://thread.gmane.org/gmane.comp.file-systems.ext4/25968/focus=26056
and here:
http://thread.gmane.org/gmane.comp.file-systems.ext4/26041

At the end of the day, thinp target is a very powerful tool, but is
does not fit all
use cases. In particular, it fragments the on-disk layout of ext4 metadata and
benchmark results for how this affect performance were never published.

Also, thinp needs to store quite a lot of metadata for the mapping of
all thinp blocks
and in order to keep this metadata durable and not hurt write speed performance
you will almost certainly need to store this metadata on an SSD - not
a bad solution
for a high end server, but not sure if everyone can afford this.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ