linux-kernel - [ANNOUNCE] BFS CPU scheduler v0.406 for 2.6.39

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-Id: <201106051620.07758.kernel@kolivas.org>
Date:	Sun, 5 Jun 2011 16:20:07 +1000
From:	Con Kolivas <kernel@...ivas.org>
To:	linux-kernel@...r.kernel.org
Subject: [ANNOUNCE] BFS CPU scheduler v0.406 for 2.6.39

People have requested an official BFS announcement separate from the -ck 
announcement, so this is to officially announce the first stable BFS CPU 
scheduler for 2.6.39.

http://www.kernel.org/pub/linux/kernel/people/ck/patches/bfs/

The major changes going into the 0.4x BFS version are a complete rework of 
busy CPU balancing to be even further simplified. Now it just flags a task as 
"sticky" if it is descheduled while still seeking further CPU time. Only one 
sticky task per CPU exists. If it is found to be sticky, it is heavily biased 
against moving to a different CPU. If a scaling CPU frequency governor is in 
use, sticky tasks will actually not move to a throttled CPU at all, instead 
preferring to wait till the CPU they came from is available again. As CPU 
frequency scaling is per-cpu, and BFS is a global CPU scheduler, BFS was 
previously unable to make the most of dynamic scaling and the turbo modes of 
newer CPUs. This change substantially improves BFS for both of those. A long-
standing bug which made low HZ configurations have poor latency behaviour was 
also found and addressed. With this change, throughput was found to further 
improve, so latency targets for BFS were improved to be the same baseline 
default 6ms regardless of the number of CPUs.

A few of the highlights where BFS performs particularly well came out in 
benchmarking on a 6x AMD machine. (This is not meant to be a claim that BFS is 
better on all workloads and all hardware, but demonstrates it has advantages 
in certain workloads up to reasonably powerful commodity hardware).

Full results are here:
http://ck.kolivas.org/patches/bfs/bfs404-cfs/
(desktop = 1000Hz + preempt, server = 100Hz + no preempt):

The highlight graphs on AMDx6 (obviously cherry-picked!) follow:

Throughput with make -j6 is: thoroughput-j6 [sic]

Throughput with x264 ultrafast is: thoroughput-ultrafast

Latency in the presence of x264 ultrafast is: latency-ultrafast-log

Thanks to Serge Belyshev for 6x results, statistical analysis and graphs.

-- 
-ck

Download attachment "thoroughput-j6.png" of type "image/png" (12699 bytes)

Download attachment "thoroughput-ultrafast.png" of type "image/png" (10875 bytes)

Download attachment "latency-ultrafast-log.png" of type "image/png" (12692 bytes)