lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160205222500.GA15463@linux.intel.com>
Date:	Fri, 5 Feb 2016 15:25:00 -0700
From:	Ross Zwisler <ross.zwisler@...ux.intel.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Ross Zwisler <ross.zwisler@...ux.intel.com>,
	Dan Williams <dan.j.williams@...el.com>,
	Dave Chinner <david@...morbit.com>,
	Matthew Wilcox <willy@...ux.intel.com>,
	Christoph Hellwig <hch@...radead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.com>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	linux-nvdimm <linux-nvdimm@...1.01.org>
Subject: Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences

On Thu, Feb 04, 2016 at 09:29:57PM +0100, Jan Kara wrote:
> I think changes aren't very intrusive so we can feed them in during RC
> phase and frankly, you have to move to using ->writepages() anyway to make
> sync(2) work reliably.

I've been looking into this a bit more, and I don't think we actually want to
have DAX flushing in response to sync(2) or syncfs(2).  Here's some text from
the BUGS section of the sync(2) man page:

BUGS
	According to the standard specification (e.g., POSIX.1-2001), sync()
	schedules the writes,  but  may  return before  the  actual  writing
	is done.  However, since version 1.3.20 Linux does actually wait.
	(This still does not guarantee data integrity: modern disks have large
	caches.) 

Based on this I don't believe that it is a requirement that sync and syncfs
actually flush the data durably to media before they return - they just need
to make sure it has been sent to the device, which is always true for all
writes PMEM via DAX, even without any calls to dax_writeback_mapping_range().

The fsync(2) man page, on the other hand, *does* require that a call to
fsync() will result in the data being durable on the media before the system
call returns.  From fsync(2):

	fsync()  transfers  ("flushes") all modified in-core data of (i.e.,
	modified buffer cache pages for) the file referred to by the file
	descriptor fd to the disk device (or other permanent  storage  device)
	so  that  all changed information  can  be retrieved even after the
	system crashed or was rebooted.  This includes writing through or
	flushing a disk cache if present. 

I think we are doing the right thing if we only call
dax_writeback_mapping_range() in response to fsync() and msync(), but skip it
in response to sync() and syncfs().  Practically this also simplifies things a
lot because we don't need to worry about cleaning the DAX inodes.  With my
naive implementation where I was calling dax_writeback_mapping_range() in
ext4_writepages(), we were flushing all DAX pages that had ever been dirtied
every 5 seconds in response to a periodic filesystem sync(), which I'm sure is
untenable.

Please let me know if anyone disagrees with any of the above.  Based on this,
I'm off to move the calls to dax_writeback_mapping_range() back into the
filesystem specific fsync()/msync() paths so that we can provide an
appropriate block device.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ