lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1234681910.19783.207.camel@sebastian.kern.oss.ntt.co.jp>
Date:	Sun, 15 Feb 2009 16:11:50 +0900
From:	Fernando Luis Vázquez Cao 
	<fernando@....ntt.co.jp>
To:	Dave Chinner <david@...morbit.com>
Cc:	Fernando Luis Vazquez Cao <fernando@....ac.jp>,
	Eric Sandeen <sandeen@...hat.com>, Jan Kara <jack@...e.cz>,
	Theodore Tso <tytso@....EDU>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Pavel Machek <pavel@...e.cz>,
	kernel list <linux-kernel@...r.kernel.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	Ric Wheeler <rwheeler@...hat.com>
Subject: Re: vfs: Add MS_FLUSHONFSYNC mount flag

On Sun, 2009-02-15 at 13:48 +1100, Dave Chinner wrote:
> On Sat, Feb 14, 2009 at 10:03:53PM +0900, Fernando Luis Vázquez Cao wrote:
> > On Sat, 2009-02-14 at 22:24 +1100, Dave Chinner wrote:
> > > On Sat, Feb 14, 2009 at 01:29:28AM +0900, Fernando Luis Vazquez Cao wrote:
> > > > On Fri, 2009-02-13 at 23:20 +1100, Dave Chinner wrote:
> > > > > On Fri, Feb 13, 2009 at 12:20:17AM -0600, Eric Sandeen wrote:
> > > > > > I'm just a little leery of the "dangerous" mount option proliferation, I
> > > > > > guess.
> > > > > 
> > > > > You're not the only one, Eric. It's bad enough having to explain to
> > > > > users what barriers do once they have lost data after a power loss,
> > > > > let alone confusing them further by adding more mount options they
> > > > > will get wrong by accident....
> > > > 
> > > > That is precisely the reason why we should use sensible defaults, which
> > > > in this case means enabling barriers and flushing disk caches on
> > > > fsync()/fdatasync() by default.
> > > > 
> > > > Adding either a new mount option (as you yourself suggest below) or a
> > > > sysfs tunable is desirable for those cases when we really do not need to
> > > > flush the disk write cache to guarantee integrity (battery-backed block
> > > > devices come to mind), or we want to be fast at the cost of potentially
> > > > losing some data.
> > > 
> > > Mount options are the wrong place for this. if you want to change
> > > the behaviour of the block device, then it should be at that level.
> > 
> > To be more precise, what we are trying to change is the behavior of
> > fsync()/fdatasync(), which users might want to change on a per-partition
> > basis. I guess this is the reason the barrier switch was made a mount
> > option, and I just wanted to be consistent.
> 
> This has no place in the kernel. Use LD_PRELOAD to make fsync() a
> no-op.

The purpose of flushonfsync is not making fsync() a no-op and goes
beyond what we can currently achieve with LD_PRELOAD. For example, if we
send the data to disk but avoid flushing the block device's write cache
we can potentially improve I/O performance at the cost of compromising
data and filesystem integrity. This is a risk that those who play fast
and loose may want assume. By the way, sadly enough this is the way many
of the filesystems in Linus' tree behave. I just wanted to change this
situation by making all filesystems issue write-cache flushes by
default.

Some people suggested to leave a knob for those who wanted to revert to
the old behavior and I myself thought that it could make sense in some
cases so decided to add the tunable flushonsync.

If there is consensus flushonfsync should be a per-device tunable I am
more than willing to make it so. My goal is to fix all filesystem so
that they emit barriers and disk flushes when they should. flushonfsync
is just a nicety I added for those who, for whatever reason, still want
the old behavior.

For the next iteration of this patchset I will take out the contentious
bits and leave only the filesystem/VFS fixes so that we can move forward
while we discuss the propriety of adding a per-device or a
per-filesystem tunable such as flushonfsync to change the default (and
safe) behavior.

> > > No mount option - too confusing for someone to work out what
> > > combination of barriers and flushing for things to work correctly.
> > 
> > As I suggested in a previous email, it is just a matter of using a safe
> > combination by default so that users do not need to figure out anything.
> 
> Too many users think that they need to specify everything rather
> than rely on defaults...

Well that is their business. From my experience most admins in the field
do not stray from their enterprise-distro provided defaults.

> > > Just make filesystems issue the necessary flush calls or barrier IOs
> > 
> > "ext3: call blkdev_issue_flush on fsync" and "ext4: call
> > blkdev_issue_flush on fsync" in this patch set implement just that for
> > ext3/4.
> > 
> > >  and allow the block devices to ignore flushes.
> > 
> > Wouldn't it make more sense to avoid sending bios down the block layer
> > which we can know in advance are going to be ignored by the block
> > device?
> 
> As soon as the block layer reports EOPNOTSUPPORTED to a barrier IO,
> the filesystem will switch them off and not issue them anymore.

Yes, that certainly makes sense. But the point in discussion is whether
users should be allowed to switch them off (it arguably makes sense in
some scenarios). I am afraid that some users will not be happy if we do
not leave the door open for them to revert to the old behavior.

> > > I don't think we want (1) at all, and I thought that if ext3/4 are using
> > > barriers then the barrier I/O issued by the journal does the flush
> > > already. Hence (3) is redundant, right?
> > 
> > No, it is no redundant because a barrier is not issued in all cases. The
> > aforementioned two patches fix ext3/4 by emitting a device flush only
> > when necessary (i.e. when a barrier would not be emitted).
> 
> Then that is a filesystem fix, not something that requires VFS
> modifications or new mount options....

Yup, as mentioned above flushonfsync is just a nicety I added to the
second iteration of this patchset and is independent from the filesystem
fixes.

Regards,

Fernando

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ