lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 07 Mar 2024 10:36:28 +0000
From: David Howells <dhowells@...hat.com>
To: Matthew Wilcox <willy@...radead.org>,
    Trond Myklebust <trond.myklebust@...merspace.com>,
    Miklos Szeredi <miklos@...redi.hu>, Christoph Hellwig <hch@....de>
Cc: dhowells@...hat.com, Andrew Morton <akpm@...ux-foundation.org>,
    Alexander Viro <viro@...iv.linux.org.uk>,
    Christian Brauner <brauner@...nel.org>,
    Jeff Layton <jlayton@...nel.org>, linux-mm@...ck.org,
    linux-fsdevel@...r.kernel.org, netfs@...ts.linux.dev,
    v9fs@...ts.linux.dev, linux-afs@...ts.infradead.org,
    ceph-devel@...r.kernel.org, linux-cifs@...r.kernel.org,
    linux-nfs@...r.kernel.org, devel@...ts.orangefs.org,
    linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] mm: Replace ->launder_folio() with flush and wait

Matthew Wilcox <willy@...radead.org> wrote:

> On Wed, Mar 06, 2024 at 10:39:37PM +0000, David Howells wrote:
> > Here's a patch to have a go at getting rid of ->launder_folio().  Since it's
> > failable and cannot guarantee that pages in the range are removed, I've tried
> > to replace laundering with just flush-and-wait, dropping the folio lock around
> > the I/O.
> 
> My sense is that ->launder_folio doesn't actually need to be replaced.
> 
> commit e3db7691e9f3dff3289f64e3d98583e28afe03db
> Author: Trond Myklebust <Trond.Myklebust@...app.com>
> Date:   Wed Jan 10 23:15:39 2007 -0800
> 
>     [PATCH] NFS: Fix race in nfs_release_page()
> 
>         NFS: Fix race in nfs_release_page()
> 
>         invalidate_inode_pages2() may find the dirty bit has been set on a page
>         owing to the fact that the page may still be mapped after it was locked.
>         Only after the call to unmap_mapping_range() are we sure that the page
>         can no longer be dirtied.
>         In order to fix this, NFS has hooked the releasepage() method and tries
>         to write the page out between the call to unmap_mapping_range() and the
>         call to remove_mapping(). This, however leads to deadlocks in the page
>         reclaim code, where the page may be locked without holding a reference
>         to the inode or dentry.
> 
>         Fix is to add a new address_space_operation, launder_page(), which will
>         attempt to write out a dirty page without releasing the page lock.
> 
>     Signed-off-by: Trond Myklebust <Trond.Myklebust@...app.com>
> 
> I don't understand why this couldn't've been solved by page_mkwrite.
> NFS did later add nfs_vm_page_mkwrite in July 2007, and maybe it's just
> not needed any more?  I haven't looked into it enough to make sure,
> but my belief is that we should be able to get rid of it.

Okay, it's slightly more complex than I thought - and I'm not sure all callers
are actually using it correctly.  There are some additional interesting cases
I've found, beyond the pre-/post-DIO case:

 (1) NFS relies on it to retry the write before stripping the pages in the
     case where a writeback error occurs.  I think this can probably be dealt
     with by sticking a filemap_fdatawrite() call before the invalidation.
     I'm not sure if this would incur the deadlock with the page reclaim code
     of which Trond speaks.

 (2) invalidate_inode_pages2() is used in some places to effect invalidation
     of the pagecache in the case where the server tells us that a third party
     modified the server copy of a file.  What the right behaviour should be
     here, I'm not sure, but at the moment, any dirty data will get laundered
     back to the server.  Possibly it should be simply invalidated locally or
     the user asked how they want to handle the divergence.

     Some filesystems use invalidate_remote_inode() instead which seems to
     leave the dirty pages in place locally.

     If it is desirous to save the dirty data, then filemap_fdatawrite()
     could be deployed before invalidating the pages.

 (3) Fuse uses invalidate_inode_pages2() in fuse_do_setattr() to strip all the
     pages from an inode that has had its size changed, laundering any page
     that's still dirty.  This would seem to be excessive, but maybe Miklos
     had a reason for doing it that way.

There are some places that should perhaps be using kiocb_invalidate_pages()
and kiocb_invalidate_post_direct_write() instead - assuming Christoph has no
objection to the latter function being exported.

David


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ