lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1253163339.15767.62.camel@marge.simson.net>
Date:	Thu, 17 Sep 2009 06:55:39 +0200
From:	Mike Galbraith <efault@....de>
To:	Serge Belyshev <belyshev@...ni.sinp.msu.ru>
Cc:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: [patchlet] Re: Epic regression in throughput since v2.6.23

On Wed, 2009-09-16 at 23:18 +0000, Serge Belyshev wrote:
> Ingo Molnar <mingo@...e.hu> writes:
> 
> > Ok, i think we've got a handle on that finally - mind checking latest 
> > -tip?
> 
> Kernel build benchmark:
> http://img11.imageshack.us/img11/4544/makej20090916.png
> 
> I have also repeated video encode benchmarks described here:
> http://article.gmane.org/gmane.linux.kernel/889444
> 
> "x264 --preset ultrafast":
> http://img11.imageshack.us/img11/9020/ultrafast20090916.png
> 
> "x264 --preset medium":
> http://img11.imageshack.us/img11/7729/medium20090916.png

Pre-ramble..
Most of the performance differences I've examined in all these CFS vs
BFS threads boil down to fair scheduler vs unfair scheduler.  If you
favor hogs, naturally, hogs getting more bandwidth perform better than
hogs getting their fair share.  That's wonderful for hogs, somewhat less
than wonderful for their competition.  That fairness is not necessarily
the best thing for throughput is well known.  If you've got a single
dissimilar task load running alone, favoring hogs may perform better..
or not.  What about mixed loads though?  Is the throughput of frequent
switchers less important than hog throughput?

Moving right along..

That x264 thing uncovered an interesting issue within CFS.  That load is
a frequent clone() customer, and when it has to compete against a not so
fork/clone happy load, it suffers mightily.  Even when running solo, ie
only competing against it's own siblings, IFF sleeper fairness is
enabled, the pain of thread startup latency is quite visible.  With
concurrent loads, it is agonizingly painful.

concurrent load test
tbench 8 vs
x264 --preset ultrafast --no-scenecut --sync-lookahead 0 --qp 20 -o /dev/null --threads 8 soccer_4cif.y4m

(i can turn knobs and get whatever numbers i want, including
outperforming bfs, concurrent or solo.. not the point)

START_DEBIT
encoded 600 frames, 44.29 fps, 22096.60 kb/s
encoded 600 frames, 43.59 fps, 22096.60 kb/s
encoded 600 frames, 43.78 fps, 22096.60 kb/s
encoded 600 frames, 43.77 fps, 22096.60 kb/s
encoded 600 frames, 45.67 fps, 22096.60 kb/s

8   1068214   672.35 MB/sec  execute  57 sec
8   1083785   672.16 MB/sec  execute  58 sec
8   1099188   672.18 MB/sec  execute  59 sec
8   1114626   672.00 MB/sec  cleanup  60 sec
8   1114626   671.96 MB/sec  cleanup  60 sec

NO_START_DEBIT
encoded 600 frames, 123.19 fps, 22096.60 kb/s
encoded 600 frames, 123.85 fps, 22096.60 kb/s
encoded 600 frames, 120.05 fps, 22096.60 kb/s
encoded 600 frames, 123.43 fps, 22096.60 kb/s
encoded 600 frames, 121.27 fps, 22096.60 kb/s

8    848135   533.79 MB/sec  execute  57 sec
8    860829   534.08 MB/sec  execute  58 sec
8    872840   533.74 MB/sec  execute  59 sec
8    885036   533.66 MB/sec  cleanup  60 sec
8    885036   533.64 MB/sec  cleanup  60 sec

2.6.31-bfs221-smp
encoded 600 frames, 169.00 fps, 22096.60 kb/s
encoded 600 frames, 163.85 fps, 22096.60 kb/s
encoded 600 frames, 161.00 fps, 22096.60 kb/s
encoded 600 frames, 155.57 fps, 22096.60 kb/s
encoded 600 frames, 162.01 fps, 22096.60 kb/s

8    458328   287.67 MB/sec  execute  57 sec
8    464442   288.68 MB/sec  execute  58 sec
8    471129   288.71 MB/sec  execute  59 sec
8    477643   288.61 MB/sec  cleanup  60 sec
8    477643   288.60 MB/sec  cleanup  60 sec

patchlet:

sched: disable START_DEBIT.

START_DEBIT induces unfairness to loads which fork/clone frequently when they
must compete against loads which do not.


Signed-off-by: Mike Galbraith <efault@....de>
Cc: Ingo Molnar <mingo@...e.hu>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
LKML-Reference: <new-submission>

 kernel/sched_features.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index d5059fd..2fc94a0 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -23,7 +23,7 @@ SCHED_FEAT(NORMALIZED_SLEEPER, 0)
  * Place new tasks ahead so that they do not starve already running
  * tasks
  */
-SCHED_FEAT(START_DEBIT, 1)
+SCHED_FEAT(START_DEBIT, 0)
 
 /*
  * Should wakeups try to preempt running tasks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ