lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 12 Apr 2018 14:11:45 -0700
From:   Andres Freund <andres@...razel.de>
To:     Jeff Layton <jlayton@...hat.com>
Cc:     Dave Chinner <david@...morbit.com>,
        Andreas Dilger <adilger@...ger.ca>,
        20180410184356.GD3563@...nk.org,
        "Theodore Y. Ts'o" <tytso@....edu>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Linux FS Devel <linux-fsdevel@...r.kernel.org>,
        "Joshua D. Drake" <jd@...mandprompt.com>
Subject: Re: fsync() errors is unsafe and risks data loss

On 2018-04-12 07:24:12 -0400, Jeff Layton wrote:
> On Thu, 2018-04-12 at 15:45 +1000, Dave Chinner wrote:
> > On Wed, Apr 11, 2018 at 07:32:21PM -0700, Andres Freund wrote:
> > > Hi,
> > > 
> > > On 2018-04-12 10:09:16 +1000, Dave Chinner wrote:
> > > > To pound the broken record: there are many good reasons why Linux
> > > > filesystem developers have said "you should use direct IO" to the PG
> > > > devs each time we have this "the kernel doesn't do <complex things
> > > > PG needs>" discussion.
> > > 
> > > I personally am on board with doing that. But you also gotta recognize
> > > that an efficient DIO usage is a metric ton of work, and you need a
> > > large amount of differing logic for different platforms. It's just not
> > > realistic to do so for every platform.  Postgres is developed by a small
> > > number of people, isn't VC backed etc. The amount of resources we can
> > > throw at something is fairly limited.  I'm hoping to work on adding
> > > linux DIO support to pg, but I'm sure as hell not going to do be able to
> > > do the same on windows (solaris, hpux, aix, ...) etc.
> > > 
> > > And there's cases where that just doesn't help at all. Being able to
> > > untar a database from backup / archive / timetravel / whatnot, and then
> > > fsyncing the directory tree to make sure it's actually safe, is really
> > > not an insane idea.
> > 
> > Yes it is. 
> > 
> > This is what syncfs() is for - making sure a large amount of of data
> > and metadata spread across many files and subdirectories in a single
> > filesystem is pushed to stable storage in the most efficient manner
> > possible.

syncfs isn't standardized, it operates on an entire filesystem (thus
writing out unnecessary stuff), it has no meaningful documentation of
it's return codes.  Yes, using syncfs() might better performancewise,
but it doesn't seem like it actually solves anything, performance aside:

> Just note that the error return from syncfs is somewhat iffy. It doesn't
> necessarily return an error when one inode fails to be written back. I
> think it mainly returns errors when you get a metadata writeback error.


> You can still use syncfs but what you'd probably have to do is call
> syncfs while you still hold all of the fd's open, and then fsync each
> one afterward to ensure that they all got written back properly. That
> should work as you'd expect.

Which again doesn't allow one to use any non-bespoke tooling (like tar
or whatnot). And it means you'll have to call syncfs() every few hundred
files, because you'll obviously run into filehandle limitations.

Greetings,

Andres Freund

Powered by blists - more mailing lists