lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 12 Feb 2009 04:34:23 +0100
From:	Nick Piggin <npiggin@...e.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Jan Kara <jack@...e.cz>, "Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Federico Cuello <fedux@...men.org.ar>,
	Artem Bityutskiy <Artem.Bityutskiy@...ia.com>
Subject: Re: [Bug #12604] Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB

On Wed, Feb 11, 2009 at 06:02:19PM -0800, Linus Torvalds wrote:
> 
> 
> On Thu, 12 Feb 2009, Nick Piggin wrote:
> 
> > On Tue, Feb 10, 2009 at 05:28:30PM +0100, Jan Kara wrote:
> > > On Sun 08-02-09 20:21:42, Rafael J. Wysocki wrote:
> > > > This message has been generated automatically as a part of a report
> > > > of recent regressions.
> > > > 
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.28.  Please verify if it still should be listed and let me know
> > > > (either way).
> > >   Yes, I've verified with latest git and the regression is still there.
> > 
> > I'm working on this FWIW...
> 
> Shouldn't we just revert it? The code does look to be broken.
> 
> It also looks like the interaction with that ever-buggy "nr_to_write" 
> thing are totally wrong. I can see that whole
> 
> 	if (!cycled) {
> 		..
> 		index = 0;
> 		goto retry
> 	}
> 
> doing all the wrong things: if we ever hit the combination of 
> "!cycled + nr_to-write==0", we're always screwed. It simply _cannot_ do 
> the right thing.

Doh, I think you spotted the bug. I was going off on the wrong track.

 
> I dunno. That whole piece-of-sh*t function has been incredibly buggy this 
> release. The code is an unreadable mess, and I think that "cyclic" stuff 
> is part of the reason for it being messy and buggy. Please convince me 
> it's worth saving, or let me revert the whole stinking pile of crud?

Well... the cyclic stuff apparently gets used by some filesystems to do
data integrity stuff, so I think the problem addressed by my broken
commit maybe shouldn't be ignored.

Maybe you're being harsh on this function. It has been two thinko bugs
so far. Before this release cycle it seemed to have the record for the
most data integrity bugs in a single function...

How's this? Jan, can you test with the bdb workload please?

--

A bug was introduced into write_cache_pages cyclic writeout by commit
31a12666d8f0c22235297e1c1575f82061480029. The intention (and comments)
is that we should cycle back and look for more dirty pages at the
beginning of the file if there is no more work to be done.

But the !done condition was dropped from the test. This means that any
time the page writeout loop breaks (eg. due to nr_to_write == 0), we
will set index to 0, then goto again. This will set done_index to index,
then find done is set, so will proceed to the end of the function. When
updating mapping->writeback_index for cyclic writeout, we now use
done_index == 0, so we're always cycling back to 0.

This seemed to be causing random mmap writes (slapadd and iozone) to
start writing more pages from the LRU and writeout would slowdown.

With this patch, iozone random write performance is increased nearly
5x on my system (iozone -B -r 4k -s 64k -s 512m -s 1200m on ext2). 

Signed-off-by: Nick Piggin <npiggin@...e.de>
---
Index: linux-2.6/mm/page-writeback.c
===================================================================
--- linux-2.6.orig/mm/page-writeback.c	2009-02-12 13:30:42.000000000 +1100
+++ linux-2.6/mm/page-writeback.c	2009-02-12 14:16:28.000000000 +1100
@@ -1079,7 +1079,7 @@ continue_unlock:
 		pagevec_release(&pvec);
 		cond_resched();
 	}
-	if (!cycled) {
+	if (!cycled && !done) {
 		/*
 		 * range_cyclic:
 		 * We hit the last page and there is more work to be done: wrap
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ