linux-kernel - Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <44CEDF2A.6080500@slaphack.com>
Date:	Mon, 31 Jul 2006 23:57:14 -0500
From:	David Masover <ninja@...phack.com>
To:	David Lang <dlang@...italinsight.com>
CC:	tdwebste2@...oo.com, Theodore Tso <tytso@....edu>,
	Nate Diller <nate.diller@...il.com>,
	Adrian Ulrich <reiser4@...nkenlights.ch>,
	"Horst H. von Brand" <vonbrand@....utfsm.cl>, ipso@...ppymail.ca,
	reiser@...esys.com, lkml@...productions.com, jeff@...zik.org,
	linux-kernel@...r.kernel.org, reiserfs-list@...esys.com
Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of  view"expressed
 by kernelnewbies.org regarding reiser4 inclusion]

David Lang wrote:
> On Mon, 31 Jul 2006, David Masover wrote:
> 
>>> And perhaps a
>>> really good clustering filesystem for markets that
>>> require NO downtime. 
>>
>> Thing is, a cluster is about the only FS I can imagine that could 
>> reasonably require (and MAYBE provide) absolutely no downtime. 
>> Everything else, the more you say it requires no downtime, the more I 
>> say it requires redundancy.
>>
>> Am I missing any more obvious examples where you can't have enough 
>> redundancy, but you can't have downtime either?
> 
> just becouse you have redundancy doesn't mean that your data is idle 
> enough for you to run a repacker with your spare cycles.

Then you don't have redundancy, at least not for reliability.  In that 
case, you have redundancy for speed.

> to run a 
> repacker you need a time when the chunk of the filesystem that you are 
> repacking is not being accessed or written to.

Reasonably, yes.  But it will be an online repacker, so it will be 
somewhat tolerant of this.

> it doesn't matter if that 
> data lives on one disk or 9 disks all mirroring the same data, you can't 
> just break off 1 of the copies and repack that becouse by the time you 
> finish it won't match the live drives anymore.

Aha.  That really depends how you're doing the mirroring.

If you're doing it at the block level, then no, it won't work.  But if 
you're doing it at the filesystem level (a cluster-based FS, or 
something that layers on top of an FS), or (most likely) the 
database/application level, then when you come back up, the new data is 
just pulled in from the logs as if it had been written to the FS.

The only example I can think of that I've actually used and seen working 
is MySQL tables, but that already covers a huge number of websites.

> database servers have a repacker (vaccum), and they are under tremendous 
> preasure from their users to avoid having to use it becouse of the 
> performance hit that it generates. (the theory in the past is exactly 
> what was presented in this thread, make things run faster most of the 
> time and accept the performance hit when you repack). the trend seems to 
> be for a repacker thread that runs continuously, causing a small impact 
> all the time (that can be calculated into the capacity planning) instead 
> of a large impact once in a while.

Hmm, if that could be done right, it wouldn't be so bad -- if you get 
twice the performance but have to repack for 2 hrs at the end of the 
week, repacker is better, right?  So if you could spread the 2 hours out 
over the week, in theory, you'd still be pretty close to twice the 
performance.

But that is fairly difficult to do, and may be more difficult to do well 
than to implement, say, a Reiser4 plugin that operates about on the 
level of rsync, but on every file modification.

> the other thing they are seeing as new people start useing them is that 
> the newbys don't realize they need to do somthing as archaic as running 
> a repacker periodicly, as a result they let things devolve down to where 
> performance is really bad without understanding why.

Yikes.  But then, that may be a failure of distro maintainers for not 
throwing it in cron for them.

I had a similar problem with MySQL.  I turned on binary logging so I 
could do database replication, but I didn't realize I had to actually 
delete the logs.  I now have a daily cron job that wipes out everything 
except the last day's logs.  It could probably be modified pretty easily 
to run hourly, if I needed to.

Moral of the story?  Maybe there's something to this "continuous 
repacker" idea, but don't ruin a good thing for the rest of us because 
of newbies.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/