linux-kernel - Re: [sqlite] light weight write barriers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <508B3E6E.6060702@vlnb.net>
Date:	Fri, 26 Oct 2012 21:52:46 -0400
From:	Vladislav Bolkhovitin <vst@...b.net>
To:	Nico Williams <nico@...ptonector.com>
CC:	General Discussion of SQLite Database <sqlite-users@...ite.org>,
	杨苏立 Yang Su Li <suli@...wisc.edu>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	drh@...ci.com
Subject: Re: [sqlite] light weight write barriers


Nico Williams, on 10/24/2012 05:17 PM wrote:
>> Yes, SCSI has full support for ordered/simple commands designed exactly for
>> that task: [...]
>>
>> [...]
>>
>> But historically for some reason Linux storage developers were stuck with
>> "barriers" concept, which is obviously not the same as ORDERED commands,
>> hence had a lot troubles with their ambiguous semantic. As far as I can tell
>> the reason of that was some lack of sufficiently deep SCSI understanding
>> (how to handle errors, believe that ACA is something legacy from parallel
>> SCSI times, etc.).
>
> Barriers are a very simple abstraction, so there's that.

It isn't simple at all. If you think for some time about barriers from the storage 
point of view, you will soon realize how bad and ambiguous they are.

>> Before that happens, people will keep returning again and again with those
>> simple questions: why the queue must be flushed for any ordered operation?
>> Isn't is an obvious overkill?
>
> That [cache flushing]

It isn't cache flushing, it's _queue_ flushing. You can call it queue draining, if 
you like.

Often there's a big difference where it's done: on the system side, or on the 
storage side.

Actually, performance improvements from NCQ in many cases are not because it 
allows the drive to reorder requests, as it's commonly thought, but because it 
allows to have internal drive's processing stages stay always busy without any 
idle time. Drives often have a long internal pipeline.. Hence the need to keep 
every stage of it always busy and hence why using ORDERED commands is important 
for performance.

> is not what's being asked for here. Just a
> light-weight barrier.  My proposal works without having to add new
> system calls: a) use a COW format, b) have background threads doing
> fsync()s, c) in each transaction's root block note the last
> known-committed (from a completed fsync()) transaction's root block,
> d) have an array of well-known ubberblocks large enough to accommodate
> as many transactions as possible without having to wait for any one
> fsync() to complete, d) do not reclaim space from any one past
> transaction until at least one subsequent transaction is fully
> committed.  This obtains ACI- transaction semantics (survives power
> failures but without durability for the last N transactions at
> power-failure time) without requiring changes to the OS at all, and
> with support for delayed D (durability) notification.

I believe what you really want is to be able to send to the storage a sequence of 
your favorite operations (FS operations, async IO operations, etc.) like:

Write back caching disabled:

data op11, ..., data op1N, ORDERED data op1, data op21, ..., data op2M, ...

Write back caching enabled:

data op11, ..., data op1N, ORDERED sync cache, ORDERED FUA data op1, data op21, 
..., data op2M, ...

Right?

(ORDERED means that it is guaranteed that this ordered command never in any 
circumstances will be executed before any previous command completed AND after any 
subsequent command completed.)

Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/