lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <482B2E50.2030601@garzik.org>
Date:	Wed, 14 May 2008 14:24:16 -0400
From:	Jeff Garzik <jeff@...zik.org>
To:	Sage Weil <sage@...dream.net>
CC:	Evgeniy Polyakov <johnpol@....mipt.ru>,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
	linux-fsdevel@...r.kernel.org
Subject: Re: POHMELFS high performance network filesystem. Transactions, failover,
 performance.

Sage Weil wrote:
>>> What is your opinion of the Paxos algorithm?
>> It is slow. But it does solve failure cases.
> 
> For writes, Paxos is actually more or less optimal (in the non-failure 
> cases, at least).  Reads are trickier, but there are ways to keep that 
> fast as well.  FWIW, Ceph extends basic Paxos with a leasing mechanism to 
> keep reads fast, consistent, and distributed.  It's only used for cluster 
> state, though, not file data.
> 
> I think the larger issue with Paxos is that I've yet to meet anyone who 
> wants their data replicated 3 ways (this despite newfangled 1TB+ disks not 
> having enough bandwidth to actualy _use_ the data they store).  

I've seen clusters in the field that planned for this -- they don't want 
to lose their data.


> Similarly, if only 1 out of 3 replicas is surviving, most people want to 
> be able to read their data, while Paxos demands a majority to ensure it is 
> correct.

This isn't necessarily true -- it's quite easy for most applications to 
come up with an alternate method for ensuring correctness of retrieved 
data, if one assumes Paxos consensus was achieved during the write-data 
phase earlier in time.  Checksumming is a common solution, but not the 
only one.  Domain- or app-specific solution, as noted, of course.

Overall, reads can be optimized outside of Paxos in many ways.


> (This is why Paxos is typically used only for critical cluster 
> configuration/state, not regular data.)

Yep, I'm working on a config daemon a la Chubby or zookeeper, based on 
Paxos, that does just this.  :)

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ