linux-kernel - Re: [PATCH 00/16] DRBD: a block device for HA clusters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.10.0905022256230.15782@asgard>
Date:	Sat, 2 May 2009 23:24:48 -0700 (PDT)
From:	david@...g.hm
To:	Neil Brown <neilb@...e.de>
cc:	Philipp Reisner <philipp.reisner@...bit.com>,
	linux-kernel@...r.kernel.org, Jens Axboe <jens.axboe@...cle.com>,
	Greg KH <gregkh@...e.de>,
	James Bottomley <James.Bottomley@...senPartnership.com>,
	Sam Ravnborg <sam@...nborg.org>, Dave Jones <davej@...hat.com>,
	Nikanth Karthikesan <knikanth@...e.de>,
	Lars Marowsky-Bree <lmb@...e.de>,
	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>,
	Kyle Moffett <kyle@...fetthome.net>,
	Bart Van Assche <bart.vanassche@...il.com>,
	Lars Ellenberg <lars.ellenberg@...bit.com>
Subject: Re: [PATCH 00/16] DRBD: a block device for HA clusters

I am not a DRDB developer, but I can answer some of your questions below.

On Sun, 3 May 2009, Neil Brown wrote:

> On Thursday April 30, philipp.reisner@...bit.com wrote:
>> Hi,
>>
>> This is a repost of DRBD, to keep you updated about the ongoing
>> cleanups and improvements.
>>
>> Patch set attached. Git tree available:
>> git pull git://git.drbd.org/linux-2.6-drbd.git drbd
>>
>> We are looking for reviews!
>>
>> Description
>>
>>   DRBD is a shared-nothing, synchronously replicated block device. It
>>   is designed to serve as a building block for high availability
>>   clusters and in this context, is a "drop-in" replacement for shared
>>   storage. Simplistically, you could see it as a network RAID 1.
>
> I know this is minor, but it bugs me every time I see that phrase
> "shared-nothing".   Surely the network is shared??

the logical network(s) as a whole are shared, but physicaly they can be 
redundant, multi-pathed, etc.

> And the code...
> Can you just say "DRBD is a synchronously replicated block device"?
> or would we have to call it SRBD then?
> Or maybe "shared-nothing" is an accepted technical term in the
> clustering world??

DRDB can be configured to be synchronous or asynchronous.

'shared-nothing' is a accepted technical term in the clustering world for 
when two systems are not using any single device.

in the case of a network, I commonly setup systems where the network has 
two switches (connected togeather with fiber so that an electrical problem 
in one switch cannot short out the other) with the primary box plugged 
into one switch and the backup box plugged into another. I also make sure 
that my systems primary and backup systems are in seperate racks, so that 
if something goes wrong in one rack that causes an excessive amount of 
heat it won't affect the backup systems (and yes, this has happened to me 
when I got lazy and stopped checking on this)

at this point the network switch is not shared (although the logical 
network is)

in the case of disk storage the common situation is 'shared-disk' where 
you have one disk array and both machines are plugged into it.

this gives you a single point of failure if the disk array crashes (even 
if it has redundant controllers, power supplies, etc things still happen), 
and the disk array can only be in one physical location.

DRDB lets you logicly setup your systems as if they were a 'shared-disk' 
architecture, but with the hardware being 'shared-nothing'

you can have the two halves of the cluster in different states, so that 
even a major disaster like a earthquake won't kill the system. (a classic 
case of 'shared-nothing'

>>
>>    1) Think of a two node HA cluster. Node A is active ('primary' in DRBD
>>     speak) has the filesystem mounted and the application running. Node B is
>>     in standby mode ('secondary' in DRBD speak).
>
> If there some strong technical reason to only allow 2 nodes?  Was it
> Asimov who said the only sensible numbers were 0, 1, and infinity?
> (People still get surprised that md/raid1 can do 2 or 3 or n drives,
> and that md/raid5 can handle just 2 :-)

in this case we have 1 replica (or '1 other machine'), so we are on an 
'interesting number' ;-)

many people would love to see DRDB extended beyond this, but my 
understanding is that doing so in non-trivial.

>>   DRBD can also be used in dual-Primary mode (device writable on both
>>   nodes), which means it can exhibit shared disk semantics in a
>>   shared-nothing cluster.  Needless to say, on top of dual-Primary
>>   DRBD utilizing a cluster file system is necessary to maintain for
>>   cache coherency.
>>
>>   More background on this can be found in this paper:
>>     http://www.drbd.org/fileadmin/drbd/publications/drbd8.pdf
>>
>>   Beyond that, DRBD addresses various issues of cluster partitioning,
>>   which the MD/NBD stack, to the best of our knowledge, does not
>>   solve. The above-mentioned paper goes into some detail about that as
>>   well.
>
> Agreed - MD/NBD could probably be easily confused by cluster
> partitioning, though I suspect that in many simple cases it would get
> it right.  I haven't given it enough thought to be sure.  I doubt the
> enhancements necessary would be very significant though.

think of two different threads doing writes directly to their side of the 
mirror and the system needs to notice this happening and copy the data to 
the other half of the mirror (with GFS working above you to coordinate the 
two threads and make sure they don't make conflicting writes)

it's not a trivial task.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/