lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230605122141.4njwwx3mrapqhvt4@quack3>
Date:   Mon, 5 Jun 2023 14:21:41 +0200
From:   Jan Kara <jack@...e.cz>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Theodore Ts'o <tytso@....edu>, Baokun Li <libaokun1@...wei.com>,
        linux-ext4@...r.kernel.org, adilger.kernel@...ger.ca, jack@...e.cz,
        ritesh.list@...il.com, linux-kernel@...r.kernel.org,
        jun.nie@...aro.org, ebiggers@...nel.org, yi.zhang@...wei.com,
        yangerkun@...wei.com, yukuai3@...wei.com,
        syzbot+a158d886ca08a3fecca4@...kaller.appspotmail.com,
        stable@...r.kernel.org
Subject: Re: [PATCH v2] ext4: fix race condition between buffer write and
 page_mkwrite

On Mon 05-06-23 11:16:55, Jan Kara wrote:
> On Mon 05-06-23 02:58:15, Matthew Wilcox wrote:
> > On Sun, Jun 04, 2023 at 05:08:21PM -0400, Theodore Ts'o wrote:
> > > On Sat, Jun 03, 2023 at 11:04:45PM -0400, Theodore Ts'o wrote:
> > > > I tried testing to see if this fixed [1], and it appears to be
> > > > triggering a lockdep warning[2] at this line in the patch:
> > > > 
> > > > [1] https://syzkaller.appspot.com/bug?extid=f4582777a19ec422b517
> > > > [2] https://syzkaller.appspot.com/x/report.txt?x=17260843280000
> > > 
> > > Looking at this more closely, the fundamental problem is by the time
> > > ext4_file_mmap() is called, the mm layer has already taken
> > > current->mm->mmap_lock, and when we try to take the inode_lock, this
> > > causes locking ordering problems with how buffered write path works,
> > > which take the inode_lock first, and then in some cases, may end up
> > > taking the mmap_lock if there is a page fault for the buffer used for
> > > the buffered write.
> > > 
> > > If we're going to stick with the approach in this patch, I think what
> > > we would need is to add a pre_mmap() function to file_operations
> > > struct, which would get called by the mmap path *before* taking
> > > current->mm->mmap_lock, so we can do the inline conversion before we
> > > take the mmap_lock.
> > > 
> > > I'm not sure how the mm folks would react to such a proposal, though.
> > > I could be seen as a bit hacky, and it's not clear that any file
> > > system other than ext4 would need something like this.  Willy, as
> > > someone who does a lot of work in both mm and fs worlds --- I'm
> > > curious what you think about this idea?
> > 
> > I'm probably missing something here, but why do we need to convert inline
> > data in page_mkwrite?  mmap() can't change i_size (stores past i_size are
> > discarded), so we should be able to simply copy the data from the page
> > cache into the inode and write the inode when it comes to writepages()
> > time.
> > 
> > Unless somebody does a truncate() or write() that expands i_size, but we
> > should be able to do the conversion then without the mmap_lock held.  No?
> > I'm not too familiar with inline data.
> 
> Yeah, I agree, that is also the conclusion I have arrived at when thinking
> about this problem now. We should be able to just remove the conversion
> from ext4_page_mkwrite() and rely on write(2) or truncate(2) doing it when
> growing i_size.

OK, thinking more about this and searching through the history, I've
realized why the conversion is originally in ext4_page_mkwrite(). The
problem is described in commit 7b4cc9787fe35b ("ext4: evict inline data
when writing to memory map") but essentially it boils down to the fact that
ext4 writeback code does not expect dirty page for a file with inline data
because ext4_write_inline_data_end() should have copied the data into the
inode and cleared the folio's dirty flag.

Indeed messing with xattrs from the writeback path to copy page contents
into inline data xattr would be ... interesting. Hum, out of good ideas for
now :-|.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ