lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 23 Apr 2020 20:36:57 -0300
From:   Mauricio Faria de Oliveira <mfo@...onical.com>
To:     linux-ext4@...r.kernel.org, "Theodore Y. Ts'o" <tytso@....edu>
Cc:     dann frazier <dann.frazier@...onical.com>,
        Andreas Dilger <adilger@...ger.ca>, Jan Kara <jack@...e.com>
Subject: [RFC PATCH 03/11] ext4: data=journal: call ext4_force_commit() in ext4_writepages() for msync()

The data-integrity syscalls (not memory-cleansing writeback)
use file_write_and_wait_range() that wait_on_page_writeback()
for every page in the range after do_writepages().

If any of these pages is mmap'ed pagecache, i.e., goes into
__ext4_journalled_writepage(), with the last couple patches
end_page_writeback() will be done on (or, not be done until)
transaction commit, which can take seconds (commit interval,
max commit age), which delays msync().

Let's fix this so that msync() syscall should just return
quickly without a delay of up to a few seconds by default.

For data=journal the next thing these syscalls do anyway is
ext4_force_commit() (see ext4_sync_file()), which is needed
for the buffered pagecache, as __filemap_fdatawrite_range()
doesn't do anything: the buffers are clean, so it returns
early without calling do_writepages() / ext4_write_pages().
So it's not possible to just move/replace that call here.

(This is better/more correct than to use ext4_handle_sync()
for mmap'ed pagecache, which triggers one commit per page,
as synchronous transaction batching in jbd2 targets other,
concurrent tasks, but in this case one single task writes
all pages back serially.)

Now for memory-cleansing writeback, even though it is not
supposed to wait, we should not wait for seconds either,
as it could delay an upcoming data-integrity syscall in
write_cache_pages() on a call to wait_on_page_writeback().
(another fix is needed for such calls to ext4_writepage()).

So just do not check for wbc->sync_mode to cover it too.

Signed-off-by: Mauricio Faria de Oliveira <mfo@...onical.com>
---
 fs/ext4/inode.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d385a11ba31e..574a062b8bcd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2709,7 +2709,37 @@ static int ext4_writepages(struct address_space *mapping,
 		goto out_writepages;
 
 	if (ext4_should_journal_data(inode)) {
+		journal_t *journal = sbi->s_journal;
+
 		ret = generic_writepages(mapping, wbc);
+		/*
+		 * On the data-integrity syscalls, file_write_and_wait_range()
+		 * will wait on page writeback after calling ext4_writepages().
+		 * For mmaped pagecache that only ends on transaction commit,
+		 * which may take up to commit interval (seconds!) to happen.
+		 *
+		 * So, ensure that ext4_force_commit() happens before return,
+		 * and after all pages in the range are set_page_writeback(),
+		 * but only if needed (i.e. check for datasync transaction
+		 * set in the inode by __ext4_journalled_writepage().)
+		 *
+		 * Do it for memory-cleasing writeback too, because it might
+		 * delay another data-integrity syscall in write_cache_pages()
+		 * on wait_on_page_writeback().
+		 */
+		if (!ret && journal) {
+			bool force_commit = false;
+
+			read_lock(&journal->j_state_lock);
+			if (journal->j_running_transaction &&
+			    journal->j_running_transaction->t_tid ==
+				EXT4_I(inode)->i_datasync_tid)
+				force_commit = true;
+			read_unlock(&journal->j_state_lock);
+
+			if (force_commit)
+				ret = ext4_force_commit(inode->i_sb);
+		}
 		goto out_writepages;
 	}
 
-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ