linux-kernel - Re: [sqlite] light weight write barriers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.02.1210241447210.8519@asgard.lang.hm>
Date:	Wed, 24 Oct 2012 15:03:00 -0700 (PDT)
From:	david@...g.hm
To:	Nico Williams <nico@...ptonector.com>
cc:	General Discussion of SQLite Database <sqlite-users@...ite.org>,
	杨苏立 Yang Su Li <suli@...wisc.edu>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	drh@...ci.com
Subject: Re: [sqlite] light weight write barriers

On Wed, 24 Oct 2012, Nico Williams wrote:

>> Before that happens, people will keep returning again and again with those
>> simple questions: why the queue must be flushed for any ordered operation?
>> Isn't is an obvious overkill?
>
> That [cache flushing] is not what's being asked for here.  Just a
> light-weight barrier.  My proposal works without having to add new
> system calls: a) use a COW format, b) have background threads doing
> fsync()s, c) in each transaction's root block note the last
> known-committed (from a completed fsync()) transaction's root block,
> d) have an array of well-known ubberblocks large enough to accommodate
> as many transactions as possible without having to wait for any one
> fsync() to complete, d) do not reclaim space from any one past
> transaction until at least one subsequent transaction is fully
> committed.  This obtains ACI- transaction semantics (survives power
> failures but without durability for the last N transactions at
> power-failure time) without requiring changes to the OS at all, and
> with support for delayed D (durability) notification.

I'm doing some work with rsyslog and it's disk-baded queues and there is a 
similar issue there. The good news is that we can have a version that is 
linux specific (rsyslog is used on other OSs, but there is an existing 
queue implementation that they can use, if the faster one is linux-only, 
but is significantly faster, that's just a win for Linux)

Like what is being described for sqlite, loosing the tail end of the 
messages is not a big problem under normal conditions. But there is a need 
to be sure that what is there is complete up to the point where it's lost.

this is similar in concept to write-ahead-logs done for databases (without 
the absolute durability requirement)

1. new messages arrive and get added to the end of the queue file.

2. a thread updates the queue to indicate that it is in the process 
of delivering a block of messages

3. the thread updates the queue to indicate that the block of messages has 
been delivered

4. garbage collection happens to delete the old messages to free up space 
(if queues go into files, this can just be to limit the file size, 
spilling to multiple files, and when an old file is completely marked as 
delivered, delete it)

I am not fully understanding how what you are describing (COW, separate 
fsync threads, etc) would be implemented on top of existing filesystems. 
Most of what you are describing seems like it requires access to the 
underlying storage to implement.

could you give a more detailed explination?

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/