lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 30 Oct 2017 09:38:07 +0100
From:   Jan Kara <jack@...e.cz>
To:     Dave Chinner <david@...morbit.com>
Cc:     Dan Williams <dan.j.williams@...il.com>, Jan Kara <jack@...e.cz>,
        Christoph Hellwig <hch@....de>, Michal Hocko <mhocko@...e.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        "J. Bruce Fields" <bfields@...ldses.org>,
        linux-mm <linux-mm@...ck.org>, Paul Mackerras <paulus@...ba.org>,
        Sean Hefty <sean.hefty@...el.com>,
        Jeff Layton <jlayton@...chiereds.net>,
        Matthew Wilcox <mawilcox@...rosoft.com>,
        linux-rdma@...r.kernel.org, Michael Ellerman <mpe@...erman.id.au>,
        Jason Gunthorpe <jgunthorpe@...idianresearch.com>,
        Doug Ledford <dledford@...hat.com>,
        Hal Rosenstock <hal.rosenstock@...il.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Gerald Schaefer <gerald.schaefer@...ibm.com>,
        "linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-xfs@...r.kernel.org,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH v3 00/13] dax: fix dma vs truncate and remove 'page-less'
 support

Hi,

On Mon 30-10-17 13:00:23, Dave Chinner wrote:
> On Sun, Oct 29, 2017 at 04:46:44PM -0700, Dan Williams wrote:
> > Coming back to this since Dave has made clear that new locking to
> > coordinate get_user_pages() is a no-go.
> > 
> > We can unmap to force new get_user_pages() attempts to block on the
> > per-fs mmap lock, but if punch-hole finds any elevated pages it needs
> > to drop the mmap lock and wait. We need this lock dropped to get
> > around the problem that the driver will not start to drop page
> > references until it has elevated the page references on all the pages
> > in the I/O. If we need to drop the mmap lock that makes it impossible
> > to coordinate this unlock/retry loop within truncate_inode_pages_range
> > which would otherwise be the natural place to land this code.
> > 
> > Would it be palatable to unmap and drain dma in any path that needs to
> > detach blocks from an inode? Something like the following that builds
> > on dax_wait_dma() tried to achieve, but does not introduce a new lock
> > for the fs to manage:
> > 
> > retry:
> >     per_fs_mmap_lock(inode);
> >     unmap_mapping_range(mapping, start, end); /* new page references
> > cannot be established */
> >     if ((dax_page = dax_dma_busy_page(mapping, start, end)) != NULL) {
> >         per_fs_mmap_unlock(inode); /* new page references can happen,
> > so we need to start over */
> >         wait_for_page_idle(dax_page);
> >         goto retry;
> >     }
> >     truncate_inode_pages_range(mapping, start, end);
> >     per_fs_mmap_unlock(inode);
> 
> These retry loops you keep proposing are just bloody horrible.  They
> are basically just a method for blocking an operation until whatever
> condition is preventing the invalidation goes away. IMO, that's an
> ugly solution no matter how much lipstick you dress it up with.
> 
> i.e. the blocking loops mean the user process is going to be blocked
> for arbitrary lengths of time. That's not a solution, it's just
> passing the buck - now the userspace developers need to work around
> truncate/hole punch being randomly blocked for arbitrary lengths of
> time.

So I see substantial difference between how you and Christoph think this
should be handled. Christoph writes in [1]:

The point is that we need to prohibit long term elevated page counts
with DAX anyway - we can't just let people grab allocated blocks forever
while ignoring file system operations.  For stage 1 we'll just need to
fail those, and in the long run they will have to use a mechanism
similar to FL_LAYOUT locks to deal with file system allocation changes.

So Christoph wants to block truncate until references are released, forbid
long term references until userspace acquiring them supports some kind of
lease-breaking. OTOH you suggest truncate should just proceed leaving
blocks allocated until references are released. We cannot have both... I'm
leaning more towards the approach Christoph suggests as it puts the burned
to the place which is causing it - the application having long term
references - and applications needing this should be sufficiently rare that
we don't have to devise a general mechanism in the kernel for this.

If the solution Christoph suggests is acceptable to you, I think we should
first write a patch to forbid acquiring long term references to DAX blocks.
On top of that we can implement mechanism to block truncate while there are
short term references pending (and for that retry loops would be IMHO
acceptable). And then we can work on a mechanism to notify userspace that
it needs to drop references to blocks that are going to be truncated so
that we can re-enable taking of long term references.

								Honza

[1]
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1522887.html

-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ