linux-kernel - [patchlet] Re: Epic regression in throughput since v2.6.23

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1253163339.15767.62.camel@marge.simson.net>
Date:	Thu, 17 Sep 2009 06:55:39 +0200
From:	Mike Galbraith <efault@....de>
To:	Serge Belyshev <belyshev@...ni.sinp.msu.ru>
Cc:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: [patchlet] Re: Epic regression in throughput since v2.6.23

On Wed, 2009-09-16 at 23:18 +0000, Serge Belyshev wrote:
> Ingo Molnar <mingo@...e.hu> writes:
> 
> > Ok, i think we've got a handle on that finally - mind checking latest 
> > -tip?
> 
> Kernel build benchmark:
> http://img11.imageshack.us/img11/4544/makej20090916.png
> 
> I have also repeated video encode benchmarks described here:
> http://article.gmane.org/gmane.linux.kernel/889444
> 
> "x264 --preset ultrafast":
> http://img11.imageshack.us/img11/9020/ultrafast20090916.png
> 
> "x264 --preset medium":
> http://img11.imageshack.us/img11/7729/medium20090916.png

Pre-ramble..
Most of the performance differences I've examined in all these CFS vs
BFS threads boil down to fair scheduler vs unfair scheduler.  If you
favor hogs, naturally, hogs getting more bandwidth perform better than
hogs getting their fair share.  That's wonderful for hogs, somewhat less
than wonderful for their competition.  That fairness is not necessarily
the best thing for throughput is well known.  If you've got a single
dissimilar task load running alone, favoring hogs may perform better..
or not.  What about mixed loads though?  Is the throughput of frequent
switchers less important than hog throughput?

Moving right along..

That x264 thing uncovered an interesting issue within CFS.  That load is
a frequent clone() customer, and when it has to compete against a not so
fork/clone happy load, it suffers mightily.  Even when running solo, ie
only competing against it's own siblings, IFF sleeper fairness is
enabled, the pain of thread startup latency is quite visible.  With
concurrent loads, it is agonizingly painful.

concurrent load test
tbench 8 vs
x264 --preset ultrafast --no-scenecut --sync-lookahead 0 --qp 20 -o /dev/null --threads 8 soccer_4cif.y4m

(i can turn knobs and get whatever numbers i want, including
outperforming bfs, concurrent or solo.. not the point)

START_DEBIT
encoded 600 frames, 44.29 fps, 22096.60 kb/s
encoded 600 frames, 43.59 fps, 22096.60 kb/s
encoded 600 frames, 43.78 fps, 22096.60 kb/s
encoded 600 frames, 43.77 fps, 22096.60 kb/s
encoded 600 frames, 45.67 fps, 22096.60 kb/s

8   1068214   672.35 MB/sec  execute  57 sec
8   1083785   672.16 MB/sec  execute  58 sec
8   1099188   672.18 MB/sec  execute  59 sec
8   1114626   672.00 MB/sec  cleanup  60 sec
8   1114626   671.96 MB/sec  cleanup  60 sec

NO_START_DEBIT
encoded 600 frames, 123.19 fps, 22096.60 kb/s
encoded 600 frames, 123.85 fps, 22096.60 kb/s
encoded 600 frames, 120.05 fps, 22096.60 kb/s
encoded 600 frames, 123.43 fps, 22096.60 kb/s
encoded 600 frames, 121.27 fps, 22096.60 kb/s

8    848135   533.79 MB/sec  execute  57 sec
8    860829   534.08 MB/sec  execute  58 sec
8    872840   533.74 MB/sec  execute  59 sec
8    885036   533.66 MB/sec  cleanup  60 sec
8    885036   533.64 MB/sec  cleanup  60 sec

2.6.31-bfs221-smp
encoded 600 frames, 169.00 fps, 22096.60 kb/s
encoded 600 frames, 163.85 fps, 22096.60 kb/s
encoded 600 frames, 161.00 fps, 22096.60 kb/s
encoded 600 frames, 155.57 fps, 22096.60 kb/s
encoded 600 frames, 162.01 fps, 22096.60 kb/s

8    458328   287.67 MB/sec  execute  57 sec
8    464442   288.68 MB/sec  execute  58 sec
8    471129   288.71 MB/sec  execute  59 sec
8    477643   288.61 MB/sec  cleanup  60 sec
8    477643   288.60 MB/sec  cleanup  60 sec

patchlet:

sched: disable START_DEBIT.

START_DEBIT induces unfairness to loads which fork/clone frequently when they
must compete against loads which do not.


Signed-off-by: Mike Galbraith <efault@....de>
Cc: Ingo Molnar <mingo@...e.hu>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
LKML-Reference: <new-submission>

 kernel/sched_features.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index d5059fd..2fc94a0 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -23,7 +23,7 @@ SCHED_FEAT(NORMALIZED_SLEEPER, 0)
  * Place new tasks ahead so that they do not starve already running
  * tasks
  */
-SCHED_FEAT(START_DEBIT, 1)
+SCHED_FEAT(START_DEBIT, 0)
 
 /*
  * Should wakeups try to preempt running tasks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/