[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0805140623001.14334@cobra.newdream.net>
Date: Wed, 14 May 2008 06:35:19 -0700 (PDT)
From: Sage Weil <sage@...dream.net>
To: Evgeniy Polyakov <johnpol@....mipt.ru>
Cc: Jeff Garzik <jeff@...zik.org>, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: POHMELFS high performance network filesystem. Transactions,
failover, performance.
> > What is your opinion of the Paxos algorithm?
>
> It is slow. But it does solve failure cases.
For writes, Paxos is actually more or less optimal (in the non-failure
cases, at least). Reads are trickier, but there are ways to keep that
fast as well. FWIW, Ceph extends basic Paxos with a leasing mechanism to
keep reads fast, consistent, and distributed. It's only used for cluster
state, though, not file data.
I think the larger issue with Paxos is that I've yet to meet anyone who
wants their data replicated 3 ways (this despite newfangled 1TB+ disks not
having enough bandwidth to actualy _use_ the data they store).
Similarly, if only 1 out of 3 replicas is surviving, most people want to
be able to read their data, while Paxos demands a majority to ensure it is
correct. (This is why Paxos is typically used only for critical cluster
configuration/state, not regular data.)
sage
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists