[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <482B4D80.5080808@garzik.org>
Date: Wed, 14 May 2008 16:37:20 -0400
From: Jeff Garzik <jeff@...zik.org>
To: Evgeniy Polyakov <johnpol@....mipt.ru>
CC: Jamie Lokier <jamie@...reable.org>, Sage Weil <sage@...dream.net>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: POHMELFS high performance network filesystem. Transactions, failover,
performance.
Evgeniy Polyakov wrote:
> No, server to connect is the server, which stores data. By addition it
> will also store it to some other places according to distributed
> algorithm (like weaver, raid, mirror, whatever).
[...]
> Sure the less number of machines between client and storage we have, the
> faster and more robust we are.
>
> Either client has to write data to all servers, or it has to write it to
> one and wait utill that server will broadcast it further (to quorum or any
> number of machines it wants). Having pure client to think to what
> servers it has to put its data is a bit wrong (if not saying more),
> since it has to join not only data network, but also control one, to
> check that some servers are alive or not, to be able not to race, when
> server is recovering and so on...
Quite true. It is a trade-off: additional complexity in the client
permits reduced latency and increased throughput. But is the additional
complexity -- including administrative and access control headaches --
worth it? As you say, the "complex" clients must join the data network.
Hardware manufacturers are putting so much effort into zero-copy and
RDMA. The client-to-many approach mimics that trend by minimizing
latency and data copying (and permitting use of more exotic or unusual
hardware).
But the client-to-many approach is not as complex as you make out. A
key attribute is simply for a client to be able to store new objects and
metadata on multiple servers in parallel. Once the data is stored
redundantly, the metadata controller may take quick action to
commit/abort the transaction. You can even shortcut the process further
by having the replicas send confirmations to the metadata controller.
That said, the biggest distributed systems seem to inevitably grow their
own "front end server" layer. Clients connect to N caching/application
servers, each of which behaves as you describe: the caching/app server
connects to the control and data networks, and performs the necessary
load/store operations.
Personally, I think the most simple thing for _users_ is where
semi-smart clients open multiple connections to an amorphous cloud of
servers, where the cloud is self-optimizing, self-balancing, and
self-repairing internally.
Jeff
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists