linux-kernel - Re: regression in page writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090923012700.GA10464@localhost>
Date:	Wed, 23 Sep 2009 09:27:00 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Chris Mason <chris.mason@...cle.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	"Li, Shaohua" <shaohua.li@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"richard@....demon.co.uk" <richard@....demon.co.uk>,
	"jens.axboe@...cle.com" <jens.axboe@...cle.com>
Subject: Re: regression in page writeback

On Wed, Sep 23, 2009 at 09:17:58AM +0800, Wu Fengguang wrote:
> On Wed, Sep 23, 2009 at 08:54:52AM +0800, Andrew Morton wrote:
> > On Wed, 23 Sep 2009 08:22:20 +0800 Wu Fengguang <fengguang.wu@...el.com> wrote:
> > 
> > > Jens' per-bdi writeback has another improvement. In 2.6.31, when
> > > superblocks A and B both have 100000 dirty pages, it will first
> > > exhaust A's 100000 dirty pages before going on to sync B's.
> > 
> > That would only be true if someone broke 2.6.31.  Did they?
> > 
> > SYSCALL_DEFINE0(sync)
> > {
> > 	wakeup_pdflush(0);
> > 	sync_filesystems(0);
> > 	sync_filesystems(1);
> > 	if (unlikely(laptop_mode))
> > 		laptop_sync_completion();
> > 	return 0;
> > }
> > 
> > the sync_filesystems(0) is supposed to non-blockingly start IO against
> > all devices.  It used to do that correctly.  But people mucked with it
> > so perhaps it no longer does.
> 
> I'm referring to writeback_inodes(). Each invocation of which (to sync
> 4MB) will do the same iteration over superblocks A => B => C ... So if
> A has dirty pages, it will always be served first.
> 
> So if wbc->bdi == NULL (which is true for kupdate/background sync), it
> will have to first exhaust A before going on to B and C.
> 
> There are no "cursor" in the superblock level iterations.

I even have an old patch for it. But Jens' patches are more general solution.

Thanks,
Fengguang
---
writeback: continue from the last super_block in syncing

Cc: David Chinner <dgc@....com>
Cc: Michael Rubin <mrubin@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Signed-off-by: Fengguang Wu <wfg@...l.ustc.edu.cn>
---
 fs/fs-writeback.c         |   12 ++++++++++++
 include/linux/writeback.h |    2 ++
 2 files changed, 14 insertions(+)

--- linux-2.6.orig/fs/fs-writeback.c
+++ linux-2.6/fs/fs-writeback.c
@@ -494,11 +494,19 @@ void
 writeback_inodes(struct writeback_control *wbc)
 {
 	struct super_block *sb;
+	int i;
+
+	if (wbc->sb_index)
+		wbc->more_io = 1;
 
 	might_sleep();
 	spin_lock(&sb_lock);
 restart:
+	i = -1;
 	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		i++;
+		if (i < wbc->sb_index)
+			continue;
 		if (sb_has_dirty_inodes(sb)) {
 			/* we're making our own get_super here */
 			sb->s_count++;
@@ -520,9 +528,13 @@ restart:
 			if (__put_super_and_need_restart(sb))
 				goto restart;
 		}
+		if (list_empty(&sb->s_io))
+			wbc->sb_index++;
 		if (wbc->nr_to_write <= 0)
 			break;
 	}
+	if (&sb->s_list == &super_blocks)
+		wbc->sb_index = 0;
 	spin_unlock(&sb_lock);
 }
 
--- linux-2.6.orig/include/linux/writeback.h
+++ linux-2.6/include/linux/writeback.h
@@ -48,6 +48,8 @@ struct writeback_control {
 					   this for each page written */
 	long pages_skipped;		/* Pages which were not written */
 
+	int sb_index;			/* the superblock to continue from */
+
 	/*
 	 * For a_ops->writepages(): is start or end are non-zero then this is
 	 * a hint that the filesystem need only write out the pages inside that
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/