[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGr1F2Fuuoowh59k-_e6cK1v0Z2TMEtM4FpDTf2APjmViqUu-g@mail.gmail.com>
Date: Wed, 26 Oct 2011 16:39:23 -0700
From: Aditya Kali <adityakali@...gle.com>
To: Christoph Hellwig <hch@...radead.org>
Cc: Eric Sandeen <sandeen@...hat.com>,
Andreas Dilger <adilger@...ger.ca>,
Lukas Czerner <lczerner@...hat.com>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
Nauman Rafique <nauman@...gle.com>,
TheodoreTso <tytso@...gle.com>,
Ric Wheeler <rwheeler@...hat.com>,
"Alasdair G.Kergon" <agk@...hat.com>
Subject: Re: [RFC] Metadata Replication for Ext4
Thanks all for your feedback. Summarizing from the discussion so far,
there seem to be three main solutions suggested for replicating
metadata:
1) Use mke2fs hack to store all metadata in 1st block group and use dm
and raid1 to mirror 1st block group (most of the metadata).
Pros: Simple approach that does not require any ext4 changes.
Cons: Added overhead of raid and device mapper will be significant
for fast SSDs
Cons: Management overhead on large number of machines
Cons: Need to add support in raid to read from the mirror if primary fails.
2) Have a separate metadata device and access all ext4 metadata from
it. This device could be raid1 or whatever.
Pros: No need for device mapper
Pros: Solves many other problems (SSDs can be used to cache
metadata for disks, etc.)
Cons: Will need to significantly over allocate space (running out
of space on this device potentially means no more writes to
filesystem).
Cons: Lot of ext4 code change
3) A replica inode that resides on either same device or an external
device (this proposal)
Pros: No need for device mapper or other additional layers
Pros: Simpler management in production
Cons: Not generic (Ext4 specific)
Cons: Complicates Ext4 for questionable gain (specially with inode
being on same device)
#2 seems to be an ideal solution, but it would be substantial amount
of efforts and will require lot of ext4 changes.
One other alternative that comes to mind is to have an external
"replica device" (hybrid of ideas #2 and #3) instead of an entire
"metadata device" with an option for the filesystem to read from the
replica first. All metadata writes that go to the original will also
go to the replica device. In addition, the filesystem can choose to
read from the replica first. With this, we get the benifits of #2 and
#3 without needing lot of ext4 (or any other filesystem) changes.
What do you think? Will this be something that could be implemented
without much intrusion into ext4 codebase?
Thanks,
On Fri, Oct 21, 2011 at 8:54 AM, Christoph Hellwig <hch@...radead.org> wrote:
> On Fri, Oct 21, 2011 at 10:52:11AM -0500, Eric Sandeen wrote:
>> With an SSD, you -really- don't know the independent failure domains,
>> with all the garbage collection & remapping that they may do, right?
>
> In fact some popular consumer SSDs do some fairly efficient data
> de-duplication which completly runs any metadata redundancy on a single
> of these devices void.
>
>
--
Aditya
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists