lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200403183746.GQ80283@magnolia>
Date:   Fri, 3 Apr 2020 11:37:46 -0700
From:   "Darrick J. Wong" <darrick.wong@...cle.com>
To:     Ira Weiny <ira.weiny@...el.com>
Cc:     Jan Kara <jack@...e.cz>, Christoph Hellwig <hch@....de>,
        Dave Chinner <david@...morbit.com>,
        "Theodore Y. Ts'o" <tytso@....edu>,
        Dan Williams <dan.j.williams@...el.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        linux-ext4 <linux-ext4@...r.kernel.org>,
        linux-xfs <linux-xfs@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH V5 00/12] Enable per-file/per-directory DAX operations V5

On Fri, Apr 03, 2020 at 11:18:43AM -0700, Ira Weiny wrote:
> On Fri, Apr 03, 2020 at 07:03:38PM +0200, Jan Kara wrote:
> > On Fri 03-04-20 08:48:29, Ira Weiny wrote:
> > > On Fri, Apr 03, 2020 at 09:27:31AM +0200, Christoph Hellwig wrote:
> > > > On Thu, Apr 02, 2020 at 01:55:19PM -0700, Ira Weiny wrote:
> > > > > > I'd just return an error for that case, don't play silly games like
> > > > > > evicting the inode.
> > > > > 
> > > > > I think I agree with Christoph here.  But I want to clarify.  I was heading in
> > > > > a direction of failing the ioctl completely.  But we could have the flag change
> > > > > with an appropriate error which could let the user know the change has been
> > > > > delayed.
> > > > > 
> > > > > But I don't immediately see what error code is appropriate for such an
> > > > > indication.  Candidates I can envision:
> > > > > 
> > > > > EAGAIN
> > > > > ERESTART
> > > > > EUSERS
> > > > > EINPROGRESS
> > > > > 
> > > > > None are perfect but I'm leaning toward EINPROGRESS.
> > > > 
> > > > I really, really dislike that idea.  The whole point of not forcing
> > > > evictions is to make it clear - no this inode is "busy" you can't
> > > > do that.  A reasonably smart application can try to evict itself.
> > > 
> > > I don't understand.  What Darrick proposed would never need any
> > > evictions.  If the file has blocks allocated the FS_XFLAG_DAX flag can
> > > not be changed.  So I don't see what good eviction would do at all.
> > 
> > I guess there's some confusion here (may well be than on my side). Darrick
> > propose that we can switch FS_XFLAG_DAX only when file has no blocks
> > allocated - fine by me. But that still does not mean than we can switch
> > S_DAX immediately, does it? Because that would still mean we need to switch
> > aops on living inode and that's ... difficult and Christoph didn't want to
> > clutter the code with it.
> > 
> > So I've understood Darrick's proposal as: Just switch FS_XFLAG_DAX flag,
> > S_DAX flag will magically switch when inode gets evicted and the inode gets
> > reloaded from the disk again. Did I misunderstand anything?
> > 
> > And my thinking was that this is surprising behavior for the user and so it
> > will likely generate lots of bug reports along the lines of "DAX inode flag
> > does not work!". So I was pondering how to make the behavior less
> > confusing... The ioctl I've suggested was just a poor attempt at that.
> 
> Ok but then I don't understand Christophs comment to "just return an error for
> that case"?  Which case?
> 
> > 
> > > > But returning an error and doing a lazy change anyway is straight from
> > > > the playbook for arcane and confusing API designs.
> > > 
> > > Jan countered with a proposal that the FS_XFLAG_DAX does change with
> > > blocks allocated.  But that S_DAX would change on eviction.  Adding that
> > > some eviction ioctl could be added.
> > 
> > No, I didn't mean that we can change FS_XFLAG_DAX with blocks allocated. I
> > was still speaking about the case without blocks allocated.
> 
> Ah ok good point.  But again what 'error' do we return when FS_XFLAG_DAX
> changed but S_DAX did not?
> 
> > 
> > > You then proposed just returning an error for that case.  (This lead me to
> > > believe that you were ok with an eviction based change of S_DAX.)
> > > 
> > > So I agreed that changing S_DAX could be delayed until an explicit eviction.
> > > But, to aid the 'smart application', a different error code could be used to
> > > indicate that the FS_XFLAG_DAX had been changed but that until that explicit
> > > eviction occurs S_DAX would remain.
> > > 
> > > So I don't fully follow what you mean by 'lazy change'?
> > > 
> > > Do you still really, really dislike an explicit eviction method for changing
> > > the S_DAX flag?
> > > 
> > > If FS_XFLAG_DAX can never be changed on a file with blocks allocated and the
> > > user wants to change the mode of operations on their 'data'; they would have to
> > > create a new file with the proper setting and move the data there.  For example
> > > copy the file into a directory marked FS_XFLAG_DAX==true?
> > > 
> > > I'm ok with either interface as I think both could be clear if documented.
> > 
> > I agree that what Darrick suggested is technically easily doable and can be
> > documented. But it is not natural behavior (i.e., different than all inode
> > flags we have) and we know how careful people are when reading
> > documentation...
> > 
> 
> Ok For 5.8 why don't we not allow FS_XFLAG_DAX to be changed on files _at_
> _all_...
> 
> In summary:
> 
>  - Applications must call statx to discover the current S_DAX state.

Ok.

>  - There exists an advisory file inode flag FS_XFLAG_DAX that is set based on
>    the parent directory FS_XFLAG_DAX inode flag.  (There is no way to change
>    this flag after file creation.)
> 
>    If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at
>    inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX.
>    Unless overridden...

Ok, fine with me. :)

>  - There exists a dax= mount option.
> 
>    "-o dax=off" means "never set S_DAX, ignore FS_XFLAG_DAX"
>    	"-o nodax" means "dax=off"

I surveyed the three fses that support dax and found that none of the
filesystems actually have a 'nodax' flag.  Now would be the time not to
add such a thing, and make people specify dax=off instead.  It would
be handy if we could have a single fsparam_enum for figuring out the dax
mount options.

>    "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
>    	"-o dax" by itself means "dax=always"
>    "-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default
> 
>  - There exists an advisory directory inode flag FS_XFLAG_DAX that can be
>    changed at any time.  The flag state is copied into any files or
>    subdirectories when they are created within that directory.  If programs
>    require file access runs in S_DAX mode, they'll have to create those files

"...they must create..."

>    inside a directory with FS_XFLAG_DAX set, or mount the fs with an
>    appropriate dax mount option.

Otherwise seems ok to me.

--D

> 
> 
> ???
> 
> Ira
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ