lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 23 Jan 2013 09:20:38 -0600
From:	Eric Sandeen <sandeen@...hat.com>
To:	Jan Kara <jack@...e.cz>
CC:	"Theodore Ts'o" <tytso@....edu>,
	Sedat Dilek <sedat.dilek@...il.com>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-next <linux-next@...r.kernel.org>, mszeredi@...e.cz
Subject: Re: jbd2: don't wake kjournald unnecessarily

On 1/23/13 3:44 AM, Jan Kara wrote:
> On Tue 22-01-13 19:37:46, Eric Sandeen wrote:
>> On 1/22/13 5:50 PM, Jan Kara wrote:
>>> On Mon 21-01-13 18:11:30, Ted Tso wrote:
>>>> On Tue, Jan 22, 2013 at 12:04:32AM +0100, Sedat Dilek wrote:
>>>>>
>>>>> Beyond the FUSE/LOOP fun, will you apply this patch to your linux-next GIT tree?
>>>>>
>>>>> Feel free to add...
>>>>>
>>>>>      Tested-by: Sedat Dilek <sedat.dilek@...il.com>
>>>>>
>>>>> A similiar patch for JBD went through your tree into mainline (see [1] and [2]).
>>>>
>>>> I'm not at all convinced that this patch has anything to do with your
>>>> problem.  I don't see how it could affect things, and I believe you
>>>> mentioned that you saw the problem even with this patch applied?  (I'm
>>>> not sure; some of your messages which you sent were hard to
>>>> understand, and you mentioned something about trying to send messages
>>>> when low on sleep :-).
>>>>
>>>> In any case, the reason why I haven't pulled this patch into the ext4
>>>> tree is because I was waiting for Eric and some of the performance
>>>> team folks at Red Hat to supply some additional information about why
>>>> this commit was making a difference in performance for a particular
>>>> proprietary, closed source benchmark.
>>>   Just a small correction - it was aim7 AFAIK which isn't closed source
>>> (anymore). You can download it from SourceForge
>>> (http://sourceforge.net/projects/aimbench/files/aim-suite7/Initial%20release/).
>>> Now I have some reservations about what the benchmark does but historically
>>> it has found quite a few issues for us as well.
>>>
>>>> I'm very suspicious about applying patches under the "cargo cult"
>>>> school of programming.  ("We don't understand why it makes a
>>>> difference, but it seems to be good, so bombs away!" :-)
>>>   Well, neither am I ;) But it is obvious the patch speeds up
>>> log_start_commit() by 'a bit' (taking spinlock, disabling irqs, ...). And
>>> apparently 'a bit' is noticeable for particular workload on a particular
>>> machine - commit statistics Eric provided showed that clearly. I'd still be
>>> happier if Eric also told us how much log_start_commit() calls there were
>>> so that one could verify that 'a bit' could indeed multiply to a measurable
>>> difference. But given how simple the patch is, I gave away after a while
>>> and just merged it...
>>
>> I am still trying to get our perf guys to collect that data, FWIW...
>> I will send it when I get it.  I bugged them again today.  :)
>>
>> (Just to be sure: I was going to measure the wakeups the old way, and the
>> avoided wakeups with the new change; sound ok?)
>   Yes, that would be what I'm interested in.

Holy cow, this is much more than I expected, but here's what they report:

old JBD: AIM7 jobs/min 97624.39;  got 78193 jbd wakeups
new JBD: AIM7 jobs/min 85929.43;  got 6306999 jbd wakeups, 6264684 extra wakeups

The "extra wakeups" were hacked in like:

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index d492d57..3e0c4eb 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -433,15 +433,25 @@ int __log_space_left(journal_t *journal)
 	return left;
 }
 
+unsigned long jbd_wakeups;
+unsigned long jbd_extra_wakeups;
+
 /*
  * Called under j_state_lock.  Returns true if a transaction commit was started.
  */
 int __log_start_commit(journal_t *journal, tid_t target)
 {
 	/*
-	 * Are we already doing a recent enough commit?
+	 * The only transaction we can possibly wait upon is the
+	 * currently running transaction (if it exists).  Otherwise,
+	 * the target tid must be an old one.
 	 */
-	if (!tid_geq(journal->j_commit_request, target)) {
+	if (/* journal->j_commit_request != target && <--- ERS: Undo "fix" */
+	    journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == target) {
+		/* if we already have the right target, this is extra */
+		if (journal->j_commit_request == target)
+			jbd_extra_wakeups++;
 		/*
 		 * We want a new commit: OK, mark the request and wakup the
 		 * commit thread.  We do _not_ do the commit ourselves.
@@ -451,9 +461,17 @@ int __log_start_commit(journal_t *journal, tid_t target)
 		jbd_debug(1, "JBD: requesting commit %d/%d\n",
 			  journal->j_commit_request,
 			  journal->j_commit_sequence);
+		jbd_wakeups++;
 		wake_up(&journal->j_wait_commit);
 		return 1;
-	}
+	} else if (!tid_geq(journal->j_commit_request, target))
+		/* This should never happen, but if it does, preserve
+		   the evidence before kjournald goes into a loop and
+		   increments j_commit_sequence beyond all recognition. */
+		WARN_ONCE(1, "jbd: bad log_start_commit: %u %u %u %u\n",
+		    journal->j_commit_request, journal->j_commit_sequence,
+		    target, journal->j_running_transaction ?
+		    journal->j_running_transaction->t_tid : 0);
 	return 0;
 }
 
@@ -2039,6 +2057,7 @@ static void __exit journal_exit(void)
 	if (n)
 		printk(KERN_EMERG "JBD: leaked %d journal_heads!\n", n);
 #endif
+	printk("got %lu jbd wakeups, %lu extra wakeups\n", jbd_wakeups, jbd_extra_wakeups);
 	jbd_remove_debugfs_entry();
 	journal_destroy_caches();
 }

-Eric

> 								Honza
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ