lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Jun 2021 14:19:15 +0200
From:   Jan Kara <jack@...e.cz>
To:     "Darrick J. Wong" <djwong@...nel.org>
Cc:     Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
        Christoph Hellwig <hch@...radead.org>,
        Dave Chinner <david@...morbit.com>, ceph-devel@...r.kernel.org,
        Chao Yu <yuchao0@...wei.com>,
        Damien Le Moal <damien.lemoal@....com>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        Jaegeuk Kim <jaegeuk@...nel.org>,
        Jeff Layton <jlayton@...nel.org>,
        Johannes Thumshirn <jth@...nel.org>,
        linux-cifs@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net, linux-mm@...ck.org,
        linux-xfs@...r.kernel.org, Miklos Szeredi <miklos@...redi.hu>,
        Steve French <sfrench@...ba.org>, Ted Tso <tytso@....edu>,
        Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH 03/14] mm: Protect operations adding pages to page cache
 with invalidate_lock

On Mon 07-06-21 09:09:22, Darrick J. Wong wrote:
> On Mon, Jun 07, 2021 at 04:52:13PM +0200, Jan Kara wrote:
> > Currently, serializing operations such as page fault, read, or readahead
> > against hole punching is rather difficult. The basic race scheme is
> > like:
> > 
> > fallocate(FALLOC_FL_PUNCH_HOLE)			read / fault / ..
> >   truncate_inode_pages_range()
> > 						  <create pages in page
> > 						   cache here>
> >   <update fs block mapping and free blocks>
> > 
> > Now the problem is in this way read / page fault / readahead can
> > instantiate pages in page cache with potentially stale data (if blocks
> > get quickly reused). Avoiding this race is not simple - page locks do
> > not work because we want to make sure there are *no* pages in given
> > range. inode->i_rwsem does not work because page fault happens under
> > mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes
> > the performance for mixed read-write workloads suffer.
> > 
> > So create a new rw_semaphore in the address_space - invalidate_lock -
> > that protects adding of pages to page cache for page faults / reads /
> > readahead.
> > 
> > Signed-off-by: Jan Kara <jack@...e.cz>
...
> > +->fallocate implementation must be really careful to maintain page cache
> > +consistency when punching holes or performing other operations that invalidate
> > +page cache contents. Usually the filesystem needs to call
> > +truncate_inode_pages_range() to invalidate relevant range of the page cache.
> > +However the filesystem usually also needs to update its internal (and on disk)
> > +view of file offset -> disk block mapping. Until this update is finished, the
> > +filesystem needs to block page faults and reads from reloading now-stale page
> > +cache contents from the disk. VFS provides mapping->invalidate_lock for this
> > +and acquires it in shared mode in paths loading pages from disk
> > +(filemap_fault(), filemap_read(), readahead paths). The filesystem is
> > +responsible for taking this lock in its fallocate implementation and generally
> > +whenever the page cache contents needs to be invalidated because a block is
> > +moving from under a page.
> 
> Having a page cache invalidation lock isn't optional anymore, so I think
> these last two sentences could be condensed:
> 
> "...from reloading now-stale page cache contents from disk.  Since VFS
> acquires mapping->invalidate_lock in shared mode when loading pages from
> disk (filemap_fault(), filemap_read(), readahead), the fallocate
> implementation must take the invalidate_lock to prevent reloading."
> 
> > +
> > +->copy_file_range and ->remap_file_range implementations need to serialize
> > +against modifications of file data while the operation is running. For
> > +blocking changes through write(2) and similar operations inode->i_rwsem can be
> > +used. For blocking changes through memory mapping, the filesystem can use
> > +mapping->invalidate_lock provided it also acquires it in its ->page_mkwrite
> > +implementation.
> 
> Following the same line of reasoning, if taking the invalidate_lock is
> no longer optional, then the conditional language in this last sentence
> is incorrect.  How about:
> 
> "To block changes to file contents via a memory mapping during the
> operation, the filesystem must take mapping->invalidate_lock to
> coordinate with ->page_mkwrite."
> 
> The code changes look fine to me, though I'm no mm expert. ;)

OK, I've updated the documentation as you suggested. Thanks for review.

									Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ