lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101226084007.7939aabc@notabene.brown>
Date:	Sun, 26 Dec 2010 08:40:07 +1100
From:	Neil Brown <neilb@...e.de>
To:	Olaf van der Spek <olafvdspek@...il.com>
Cc:	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: Atomic non-durable file write API

On Fri, 24 Dec 2010 12:17:46 +0100 Olaf van der Spek <olafvdspek@...il.com>
wrote:

> On Thu, Dec 23, 2010 at 10:51 PM, Neil Brown <neilb@...e.de> wrote:
> > You are asking for something that doesn't exist, which is why no-one can tell
> > you want the answer is.
> 
> It seems like a very common and basic operation. If it doesn't exist
> IMO it should be created.
> 
> > The only mechanism for synchronising different filesystem operations is
> > fsync.  You should use that.
> >
> > If it is too slow, use data journalling, and place your journal on a
> > small low-latency device (NVRAM??)
> 
> This isn't about some DB-like app, it's about normal file writes, like
> archive extractions, compiling, editors, etc.
> 

Yes, it might be nice to have a very low cost way to make those safer against
corruption during a crash.
It would have to be *very* low cost as in most cases the cost of cleaning up
after the crash instead (e.g. 'make clean') is quite low.  But people do
sometime edit /etc/init.d files with an ordinary editor and it would be
rather embarrassing if a crash just at the wrong time left some critical file
incomplete, and maybe it would be easier to teach editors to fsync before
rename for files in /etc .....

So what would this mechanism really look like?  I think the proposal is to
delay committing the rename until the writeout of the file is complete,
without accelerating the writeout.
That would probably require delaying all updates to the directory until the
writeout was complete, as trying to reason about which changes were dependent
and which were independent is unlikely to be easy.

So as soon as you rename a file, you create a dependency between the file and
the directory such that no update for the directory may be written while any
page in the file is dirty.  Conversely, any fsync of the directory would
fsync the file as well.

Any write to the file should probably break the dependency as you can no
longer be sure what exactly the rename was supposed to protect.

I suspect that much of the infrastructure for this could be implemented in
the VFS/VM.  Certainly the dependency linkage between inodes, created on
rename, destroyed on write or fsync or when writeout on the inode completes,
and the fsync dependency could be common code.  Preventing writeout of
directories with dependent files would need some fs interaction. You could
probably prototype in ext2 quite easily to do some testing and collection
some numbers on overhead.

I think this would be an interesting project for someone to do and I would be
happy to review any patches.  Whether it ever got further than an interesting
project would depend very much on how intrusive it was to other filesystems,
how much over head it caused, and what actual benefits resulted.
If anyone wanted to pursue this idea, they would certainly need to address
each of those in their final proposal.

I think there could be room for improved transactional semantics in Linux
filesystems.  This might be what they should look like ... don't know yet.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ