[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160209094353.GF9451@quack.suse.cz>
Date: Tue, 9 Feb 2016 10:43:53 +0100
From: Jan Kara <jack@...e.cz>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Dave Chinner <david@...morbit.com>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Theodore Ts'o <tytso@....edu>,
Alexander Viro <viro@...iv.linux.org.uk>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Andrew Morton <akpm@...ux-foundation.org>,
Jan Kara <jack@...e.com>,
Matthew Wilcox <willy@...ux.intel.com>,
linux-ext4 <linux-ext4@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux MM <linux-mm@...ck.org>,
"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
XFS Developers <xfs@....sgi.com>, jmoyer <jmoyer@...hat.com>
Subject: Re: [PATCH 2/2] dax: move writeback calls into the filesystems
On Mon 08-02-16 12:55:24, Dan Williams wrote:
> On Mon, Feb 8, 2016 at 12:18 PM, Dave Chinner <david@...morbit.com> wrote:
> [..]
> >> Setting aside the current block zeroing problem you seem to assuming
> >> that DAX will always be faster and that may not be true at a media
> >> level. Waiting years for some applications to determine if DAX makes
> >> sense for their use case seems completely reasonable. In the meantime
> >> the apps that are already making these changes want to know that a DAX
> >> mapping request has not silently dropped backed to page cache. They
> >> also want to know if they successfully jumped through all the hoops to
> >> get a larger than pte mapping.
> >>
> >> I agree it is useful to be able to force DAX on an unmodified
> >> application to see what happens, and it follows that if those
> >> applications want to run in that mode they will need functional
> >> fsync()...
> >>
> >> I would feel better if we were talking about specific applications and
> >> performance numbers to know if forcing DAX on application is a debug
> >> facility or a production level capability. You seem to have already
> >> made that determination and I'm curious what I'm missing.
> >
> > I'm not setting any policy here at all. This whole argument is
> > based around the DAX mount option doing "global fs enable or
> > silently turning it off" and the application not knowing about that.
> >
> > The whole point of having a persistent per-inode DAX flags is that
> > it is a policy mechanism, not a policy. The application can, if it
> > is DAX aware, directly control whether DAX is used on a file or not.
> > The application can even query and clear that persistent inode flag
> > if it is configured not to (or cannot) use DAX.
> >
> > If the filesystem cannot support DAX, then we can error out attempts
> > to set the DAX flag and then the app knows DAX is not available.
> > i.e. the attempt to set policy failed. If the flag is set, then the
> > inode will *always* use DAX - there is no "fall back to page cache"
> > when DAX is enabled.
> >
> > If the applicaiton is not DAX aware, then the admin can control the
> > DAX policy by manipulating these flags themselves, and hence control
> > whether DAX is used by the application or not.
> >
> > If you think I'm dictating policy for DAX users and application,
> > then you haven't understood anything I've previously said about why
> > the DAX mount option needs to die before any of this is considered
> > production ready. DAX is not an opaque "all or nothing" option. XFS
> > will provide apps and admins with fine-grained, persistent,
> > discoverable policy flags to allow admins and applications to set
> > DAX policies however they see fit. This simply cannot be done if the
> > only knob you have is a mount option that may or may not stick.
>
> I agree the mount option needs to die, and I fully grok the reasoning.
> What I'm concerned with is that a system using fully-DAX-aware
> applications is forced to incur the overhead of maintaining *sync
> semantics, periodic sync(2) in particular, even if it is not relying
> on those semantics.
Let me somewhat correct this: IMO hard requirement is maintaining sync(2)
semantics. Periodic writeback does not have any hard durability guarantees
and we are free to ignore such requests in ->writepages() (that function
has enough information in the writeback_control structure to differentiate
between periodic writeback and data integrity sync) if we decide it is
useful. Actually, we could do that even for 4.5.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists