linux-kernel - [PATCH, take 2] Speedup divides by cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45DD5217.3000608@cosmosbay.com>
Date:	Thu, 22 Feb 2007 09:19:35 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>
CC:	linux-kernel@...r.kernel.org
Subject: [PATCH, take 2] Speedup divides by cpu_power in scheduler

I noticed expensive divides done in try_to_wakeup() and find_busiest_group() 
on a bi dual core Opteron machine (total of 4 cores), moderatly loaded (15.000 
context switch per second)

oprofile numbers :

CPU: AMD64 processors, speed 2600.05 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit 
mask of 0x00 (No unit mask) count 50000
samples  %        symbol name
...
613914    1.0498  try_to_wake_up
    834  0.0013 :ffffffff80227ae1:   div    %rcx
77513  0.1191 :ffffffff80227ae4:   mov    %rax,%r11

608893    1.0413  find_busiest_group
   1841  0.0031 :ffffffff802260bf:       div    %rdi
140109  0.2394 :ffffffff802260c2:       test   %sil,%sil


Some of these divides can use the reciprocal divides we introduced some time 
ago (currently used in slab AFAIK)

We can assume a load will fit in a 32bits number, because with a 
SCHED_LOAD_SCALE=128 value, its still a theorical limit of 33554432

When/if we reach this limit one day, probably cpus will have a fast hardware 
divide and we can zap the reciprocal divide trick.

Ingo suggested to rename cpu_power to __cpu_power to make clear it should not 
be modified without changing its reciprocal value too.

I did not convert the divide in cpu_avg_load_per_task(), because tracking 
nr_running changes may be not worth it ? We could use a static table of 32 
reciprocal values but it would add a conditional branch and table lookup.

Signed-off-by: Eric Dumazet <dada1@...mosbay.com>


View attachment "sched_use_reciprocal_divide.patch" of type "text/plain" (6321 bytes)