linux-kernel - Re: lmbench regression due to cond_resched nullification change 26-rc5 vs. 25

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.10.0806201153470.3167@woody.linux-foundation.org>
Date:	Fri, 20 Jun 2008 12:10:14 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Christian Borntraeger <borntraeger@...ibm.com>
cc:	LKML <linux-kernel@...r.kernel.org>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: lmbench regression due to cond_resched nullification change
 26-rc5 vs. 25

On Fri, 20 Jun 2008, Christian Borntraeger wrote:
> 
> On a 6-way s390 I have seen some interesting regression in 2.6.26-rc5 vs. 
> 2.6.25 for the lmbench benchmark.
> 
> For example select file 500:
> 23 microseconds
> 32 microseconds
> 
> Several lmbench tests show a regression but I only bisected the select test 
> case so far:
> -------------------------<snip---------------------
> 
> commit c714a534d85576af21b06be605ca55cb2fb887ee
> 
>     Make 'cond_resched()' nullification depend on PREEMPT_BKL
> 
> Reverting that patch gives me the 2.6.25 performance. 
> 
> I think the patch is fine from the correctness point of view (do resched 
> inside BKL protected zones if its safe) but I dont understand why it has a 
> large impact on the select microbenchmark. Any ideas? Is it simply the 
> overhead of _cond_resched?

Yeah, I do think that it really is simply the overhead of _cond_resched.

Most places that use cond_resched() and friends do it in fairly heavy 
loops. That inner select() loop, in contrast, is rather heavily optimized, 
partly exactly because it's one of the hottest loops in one of the lmbench 
benchmarks.

Of course, it *can* also be simply that it now reschedules more, and a lot 
of lmbench benchmarks are very sensitive to scheduling behaviour. 
Batch-mode is almost invariably more efficient than any fine-grained 
scheduling can ever be, so there is always a performance count in having 
low latency.

There's also a third (and rather funny) reason the "cond_resched()" may be 
unnecessarily expensive in that loop: the "need_resched" bit may be set 
because that very same running thread was woken up! So we may end up 
deciding to reschedule because the thread thinks it needs to wake itself 
up ;)

[ That third case happens because the way we avoid race conditions with 
  going to sleep while a wakeup event happens is that we *mark* ourselves 
  as being sleeping before we actually do all the tests in lmbench. ]

So there's a few different things that may make that conditional 
reschedule a bad thing in the poll() loop.

But quite frankly, regardless of exactly why it happens, it absolutely 
makes no sense to even have that cond_resched() in the _innermost_ loop - 
the one that is called for every single fd. It's much better to move the 
conditional reschedule out a bit.

That inner loop was very much designed to compile into nice assembly 
language, and it's possible that the cond_resched() simply causes extra 
register pressure and keeps us from keeping the bitmasks in registers etc.

So this trivial patch, which moves the cond_resched() to be outside the 
single-word select() loop, would be a good idea _regardless_. Does it make 
any difference for you? If it's purely a "code generation got worse" 
issue, it should help a bit.

		Linus

---
 fs/select.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 8dda969..da0e882 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -249,7 +249,6 @@ int do_select(int n, fd_set_bits *fds, s64 *timeout)
 						retval++;
 					}
 				}
-				cond_resched();
 			}
 			if (res_in)
 				*rinp = res_in;
@@ -257,6 +256,7 @@ int do_select(int n, fd_set_bits *fds, s64 *timeout)
 				*routp = res_out;
 			if (res_ex)
 				*rexp = res_ex;
+			cond_resched();
 		}
 		wait = NULL;
 		if (retval || !*timeout || signal_pending(current))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/