lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 25 Feb 2020 13:03:06 -0800
From:   Ira Weiny <ira.weiny@...el.com>
To:     Christoph Hellwig <hch@....de>
Cc:     Dave Chinner <david@...morbit.com>, linux-kernel@...r.kernel.org,
        Alexander Viro <viro@...iv.linux.org.uk>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        Dan Williams <dan.j.williams@...el.com>,
        "Theodore Y. Ts'o" <tytso@....edu>, Jan Kara <jack@...e.cz>,
        linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH V4 07/13] fs: Add locking for a dynamic address space
 operations state

On Tue, Feb 25, 2020 at 06:36:33PM +0100, Christoph Hellwig wrote:
> On Tue, Feb 25, 2020 at 11:09:37AM +1100, Dave Chinner wrote:
> > > No, the original code was broken because it didn't serialize between
> > > DAX and buffer access.
> > > 
> > > Take a step back and look where the problems are, and they are not
> > > mostly with the aops.  In fact the only aop useful for DAX is
> > > is ->writepages, and how it uses ->writepages is actually a bit of
> > > an abuse of that interface.
> > 
> > The races are all through the fops, too, which is one of the reasons
> > Darrick mentioned we should probably move this up to file ops
> > level...
> 
> But the file ops are very simple to use.  Pass the flag in the iocb,
> and make sure the flag can only changed with i_rwsem held.  That part
> is pretty trivial, the interesting case is mmap because it is so
> spread out.
> 
> > > So what we really need it just a way to prevent switching the flag
> > > when a file is mapped,
> > 
> > That's not sufficient.
> > 
> > We also have to prevent the file from being mapped *while we are
> > switching*. Nothing in the mmap() path actually serialises against
> > filesystem operations, and the initial behavioural checks in the
> > page fault path are similarly unserialised against changing the
> > S_DAX flag.
> 
> And the important part here is ->mmap.  If ->mmap doesn't get through
> we are not going to see page faults.

The series already blocks mmap'ed files from a switch.  That was not crazy
hard.  I don't see a use case to support changing the mode while the file is
mapped.

> 
> > > and in the normal read/write path ensure the
> > > flag can't be switch while I/O is going on, which could either be
> > > done by ensuring it is only switched under i_rwsem or equivalent
> > > protection, or by setting the DAX flag once in the iocb similar to
> > > IOCB_DIRECT.
> > 
> > The iocb path is not the problem - that's entirely serialised
> > against S_DAX changes by the i_rwsem. The problem is that we have no
> > equivalent filesystem level serialisation for the entire mmap/page
> > fault path, and it checks S_DAX all over the place.
> 
> Not right now.  We have various IS_DAX checks outside it.  But it is
> easily fixable indeed.
> 
> > /me wonders if the best thing to do is to add a ->fault callout to
> > tell the filesystem to lock/unlock the inode right up at the top of
> > the page fault path, outside even the mmap_sem.  That means all the
> > methods that the page fault calls are protected against S_DAX
> > changes, and it gives us a low cost method of serialising page
> > faults against DIO (e.g. via inode_dio_wait())....
> 
> Maybe.  Especially if it solves real problems and isn't just new
> overhead to add an esoteric feature.

I thought about doing something like this but it seemed easier to block the
change while the file was mapped.

> 
> > 
> > > And they easiest way to get all this done is as a first step to
> > > just allowing switching the flag when no blocks are allocated at
> > > all, similar to how the rt flag works.
> > 
> > False equivalence - it is not similar because the RT flag changes
> > and their associated state checks are *already fully serialised* by
> > the XFS_ILOCK_EXCL. S_DAX accesses have no such serialisation, and
> > that's the problem we need to solve...
> 
> And my point is that if we ensure S_DAX can only be checked if there
> are no blocks on the file, is is fairly easy to provide the same
> guarantee.  And I've not heard any argument that we really need more
> flexibility than that.  In fact I think just being able to change it
> on the parent directory and inheriting the flag might be more than
> plenty, which would lead to a very simple implementation without any
> of the crazy overhead in this series.

Inherit on file creation or also if a file was moved under the directory?

Do we need to consider hard or soft links?

Ira

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ