linux-kernel - Re: Linux 2.6.29

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090327171955.78662c1e@lxorguk.ukuu.org.uk>
Date:	Fri, 27 Mar 2009 17:19:55 +0000
From:	Alan Cox <alan@...rguk.ukuu.org.uk>
To:	Matthew Garrett <mjg59@...f.ucam.org>
Cc:	Theodore Tso <tytso@....edu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

O> If user applications should always check errors, and if errors can't be 
> reliably produced unless you fsync() before close(), then the correct 
> behaviour for the kernel is to always flush buffers to disk before 
> returning from close(). The reason we don't is that it would be an 

You make a few assumptions here

Unfortunately:
- close() occurs many times on a file
- the kernel cannot tell which close() calls need to commit data
- there are many cases where data is written and there is a genuine
  situation where it is acceptable over a crash to lose data providing
  media failure is rare (eg log files in many situations - not banks
  obviously)

The kernel cannot tell them apart, while fsync/close() as a pair allows
the user to correctly indicate their requirements.

Even "fsync on last close" can backfire horribly if you happen to have a
handle that is inherited by a child task or kept for reading for a long
period.

For an event driven app you really want some kind of threaded or async
fsync then close (fbarrier isn't quite enough because you don't get told
when the barrier is passed). That could be implemented using threads in
the relevant desktops libraries with the thread doing

	fsync()
	poke event thread
	exit

(or indeed for most cases as part of the more general
write-file-interact-with-user-etc call)

> If every application that does a clobbering rename has to call 
> fbarrier() first, then the kernel should just guarantee to do so on the 

Rename is a different problem - and a nastier one. Unfortunately even in
posix fsync says nothing about how metadata updating is handled or what
the ordering rules are between two fsync() calls on different files.

There were problems with trying to order rename against data writeback.
fsync ensures the file data and metadata is valid but doesn't (and
cannot) connect this with the directory state. So if you need to implement

	write data
	ensure it is committed
	rename it
	after the rename is committed then ...

you can't do that in POSIX. Linux extends fsync() so you can fsync a
directory handle but that is an extension to fix the problem rather than
a standard behaviour.

(Also helpful here would be fsync_range, fdatasync_range and
fbarrier_range)

> application's behalf. ext3, ext4 and btrfs all effectively do this, so 
> we should just make it explicit that Linux filesystems are expected to 
> behave this way. 

> If people want to make their code Linux specific then  that's their problem, not the kernel's.

Agreed - which is why close should not happen to do an fsync(). That's
their problem for writing code thats specific to some random may happen
behaviour on certain Linux releases - and unfortunately with no obvious
cheap cure.

--
	"Alan, I'm getting a bit worried about you."
				-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/