lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230605091655.24vl5fjesfskt3o5@quack3>
Date:   Mon, 5 Jun 2023 11:16:55 +0200
From:   Jan Kara <jack@...e.cz>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Theodore Ts'o <tytso@....edu>, Baokun Li <libaokun1@...wei.com>,
        linux-ext4@...r.kernel.org, adilger.kernel@...ger.ca, jack@...e.cz,
        ritesh.list@...il.com, linux-kernel@...r.kernel.org,
        jun.nie@...aro.org, ebiggers@...nel.org, yi.zhang@...wei.com,
        yangerkun@...wei.com, yukuai3@...wei.com,
        syzbot+a158d886ca08a3fecca4@...kaller.appspotmail.com,
        stable@...r.kernel.org
Subject: Re: [PATCH v2] ext4: fix race condition between buffer write and
 page_mkwrite

On Mon 05-06-23 02:58:15, Matthew Wilcox wrote:
> On Sun, Jun 04, 2023 at 05:08:21PM -0400, Theodore Ts'o wrote:
> > On Sat, Jun 03, 2023 at 11:04:45PM -0400, Theodore Ts'o wrote:
> > > I tried testing to see if this fixed [1], and it appears to be
> > > triggering a lockdep warning[2] at this line in the patch:
> > > 
> > > [1] https://syzkaller.appspot.com/bug?extid=f4582777a19ec422b517
> > > [2] https://syzkaller.appspot.com/x/report.txt?x=17260843280000
> > 
> > Looking at this more closely, the fundamental problem is by the time
> > ext4_file_mmap() is called, the mm layer has already taken
> > current->mm->mmap_lock, and when we try to take the inode_lock, this
> > causes locking ordering problems with how buffered write path works,
> > which take the inode_lock first, and then in some cases, may end up
> > taking the mmap_lock if there is a page fault for the buffer used for
> > the buffered write.
> > 
> > If we're going to stick with the approach in this patch, I think what
> > we would need is to add a pre_mmap() function to file_operations
> > struct, which would get called by the mmap path *before* taking
> > current->mm->mmap_lock, so we can do the inline conversion before we
> > take the mmap_lock.
> > 
> > I'm not sure how the mm folks would react to such a proposal, though.
> > I could be seen as a bit hacky, and it's not clear that any file
> > system other than ext4 would need something like this.  Willy, as
> > someone who does a lot of work in both mm and fs worlds --- I'm
> > curious what you think about this idea?
> 
> I'm probably missing something here, but why do we need to convert inline
> data in page_mkwrite?  mmap() can't change i_size (stores past i_size are
> discarded), so we should be able to simply copy the data from the page
> cache into the inode and write the inode when it comes to writepages()
> time.
> 
> Unless somebody does a truncate() or write() that expands i_size, but we
> should be able to do the conversion then without the mmap_lock held.  No?
> I'm not too familiar with inline data.

Yeah, I agree, that is also the conclusion I have arrived at when thinking
about this problem now. We should be able to just remove the conversion
from ext4_page_mkwrite() and rely on write(2) or truncate(2) doing it when
growing i_size.

								Honza

-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ