linux-ext4 - RE: ext4 file replace guarantees

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <C0F0BC787567C848B2C90989451123DA2363CBBE@ATLEXMBX4.ARRS.ARRISI.com>
Date:	Sat, 22 Jun 2013 13:40:26 +0000
From:	"Sidorov, Andrei" <Andrei.Sidorov@...isi.com>
To:	"Theodore Ts'o" <tytso@....edu>, Dave Chinner <david@...morbit.com>
CC:	Ryan Lortie <desrt@...rt.ca>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: RE: ext4 file replace guarantees

> From a philosophical point of view, I agree with you.  As I wrote in
> my earlier messages, assuming the applications aren't abusively
> calling g_file_set_contents() several times per second, I don't
> understand why Ryan is trying so hard to optimize it.  The fact that
> he's trying to optimize it at least to me seems to indicate a simple
> admission that there *are* broken applications out there, some of
> which may be calling it with high frequency, perhaps out of the UI
> thread.

Well, one application calling fsync is almost nothing to care about.
On the other hand tens and hundreds of apps doing fsync's is a disaster.

> And having general applications or generic desktop libraries trying to
> depend on specific implementation details of file systems is really
> ugly.  So it's not something I'm all that excited about.

Me too, but people have to do that because fs api is too generic and at the
same time one has to account fs specifics in order to make their app take most
advantage or at least to avoid inefficiencies.
For example I have an app that constantly does appending writes to about 15
files and I must ensure that no more than 5 seconds will be lost in an event
of system crash or power loss. How would you do that in generic way?
Generic and portable way to do it is to start 15 threads and call fsyncs on
those fds at the same time. That works fine with JFS since it doesn't do
flushes and it works fine with ext4 because all those fsync's are likely to
complete within single transaction.
However that doesn't scale well and it forces app to do bursts. Scalable, but
still bursty solution could be io_submit, but afaik no fs currently supports
async fsync.
What if you want to distribute the load? Single dedicated thread calling
fsync's works fine with JFS, but sucks with ext4. Ok, there is a
sync_file_range, let's try it out. Luckily I have control over commit=N option
to underlying ext4 fs which I leave at default 5s. Otherwise I would like to
have an ioctl to ext4 to force commit (I'm not sure if fsync on a single fd
will commit currently running transaction). Sync thread calls sync_file_range
evenly over 5s interval, ext4 does commits every 5s. Nice! But it doesn't work
with JFS. Therefore I have two implementations for different file systems.

> Personally, I think application programmers *shouldn't* need such a
> facility, if their applications are competently designed and
> implemented.  But unfortunately, they outnumber us file system
> developers, and apparently many of them seem to want to do things
> their way, whether we like it or not.

I would argue : )
fsync is not the one to rule them all. It's semantics is clear: write all
those bytes NOW.
The fact fsync can be used as a barrier doesn't mean it's the best way to do
it. There are quite few cases where write-right-now semantics is
absolutely required. More often apps just want atomic file updates and
sort of writeback control which is available only as system-wide knob.

As for atomic updates, I'm thinking of something like io_exec() or
io_submit_atomic() or whatever name is best for it. Probably it shouldn't be
tied to kaio.
This syscall would accept an array of iocb's and guarantee atomicity of the
update. This shouldn't be a big deal for ext4 to support it because it already
supports data journalling, which is however only block/page-wise atomic.
Such a syscall wouldn't be undervalued if majority of file systems support it.

Regards,
Andrey.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html