[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK3OfOjD4XBGfu3cnMwTvCfec0Lvg3zrO16+pXtiFF4UWpFjDw@mail.gmail.com>
Date: Mon, 26 Nov 2012 14:05:12 -0600
From: Nico Williams <nico@...ptonector.com>
To: Vladislav Bolkhovitin <vst@...b.net>
Cc: Chris Friesen <chris.friesen@...band.com>,
Ryan Johnson <ryan.johnson@...utoronto.ca>,
General Discussion of SQLite Database
<sqlite-users@...ite.org>, linux-fsdevel@...r.kernel.org,
"Theodore Ts'o" <tytso@....edu>,
linux-kernel <linux-kernel@...r.kernel.org>,
Richard Hipp <drh@...ci.com>
Subject: Re: [sqlite] light weight write barriers
Vlad,
You keep saying that programmers don't understand "barriers". You've
provided no evidence of this. Meanwhile memory barriers are generally
well understood, and every programmer I know understands that a
"barrier" is a synchronization primitive that says that all operations
of a certain type will have completed prior to the barrier returning
control to its caller.
For some filesystems it is possible to configure fsync() to act as a
barrier: for example, ZFS can be told to perform no synchronous
operations for a given dataset, in which case fsync() devolves into a
simple barrier. (Cue Simon to tell us that some hardware and some
OSes, and some filesystems simply cannot implement fsync(), with or
without synchronicity.)
So just give us a barrier. Yes, I know, it's tricky to implement, but
it'd be OK to return EOPNOSUPP, and let the app do something else
(e.g., call fsync() instead, tell the user to expect instability, tell
the user to get a better system, ...).
As for implementation, it helps to have a journalled or log-structured
filesystem. It also helps to have hardware synchronization primitives
that don't suck, but these aren't entirely necessary: ZFS, for
example, can recover [*] from N incomplete transactions[**], and still
provides fsync() as a barrier given its on-disk structure and the ZIL.
Note that ZFS recovery from incomplete transactions should never be
necessary where the HW has proper cache flush support, but the
recovery functionality was added precisely because of lousy hardware.
[*] At volume import time, such as at boot-time.
[**] Granted, this requires user input, but if the user didn't care it
could be made automatic.
Nico
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists