[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200708021408.24876.phillips@phunq.net>
Date: Thu, 2 Aug 2007 14:08:24 -0700
From: Daniel Phillips <phillips@...nq.net>
To: Evgeniy Polyakov <johnpol@....mipt.ru>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: Distributed storage.
On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote:
> Hi.
>
> I'm pleased to announce first release of the distributed storage
> subsystem, which allows to form a storage on top of remote and local
> nodes, which in turn can be exported to another storage as a node to
> form tree-like storages.
Excellent! This is precisely what the doctor ordered for the
OCFS2-based distributed storage system I have been mumbling about for
some time. In fact the dd in ddsnap and ddraid stands for "distributed
data". The ddsnap/raid devices do not include an actual network
transport, that is expected to be provided by a specialized block
device, which up till now has been NBD. But NBD has various
deficiencies as you note, in addition to its tendency to deadlock when
accessed locally. Your new code base may be just the thing we always
wanted. We (zumastor et al) will take it for a drive and see if
anything breaks.
Memory deadlock is a concern of course. From a cursory glance through,
it looks like this code is pretty vm-friendly and you have thought
quite a lot about it, however I respectfully invite peterz
(obsessive/compulsive memory deadlock hunter) to help give it a good
going over with me.
I see bits that worry me, e.g.:
+ req = mempool_alloc(st->w->req_pool, GFP_NOIO);
which seems to be callable in response to a local request, just the case
where NBD deadlocks. Your mempool strategy can work reliably only if
you can prove that the pool allocations of the maximum number of
requests you can have in flight do not exceed the size of the pool. In
other words, if you ever take the pool's fallback path to normal
allocation, you risk deadlock.
Anyway, if this is as grand as it seems then I would think we ought to
factor out a common transfer core that can be used by all of NBD,
iSCSI, ATAoE and your own kernel server, in place of the roll-yer-own
code those things have now.
Regards,
Daniel
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists