lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070703185748.GA4047@Krystal>
Date:	Tue, 3 Jul 2007 14:57:48 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Alexey Dobriyan <adobriyan@...il.com>
Cc:	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 10/10] Scheduler profiling - Use immediate values

* Alexey Dobriyan (adobriyan@...il.com) wrote:
> On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote:
> > Use immediate values with lower d-cache hit in optimized version as a
> > condition for scheduler profiling call.
> 
> How much difference in performance do you see?
> 

Hi Alexey,

Please have a look at Documentation/immediate.txt for that information.
Also note that the main advantage of the load immediate is to free a
cache line. Therefore, I guess the best way to quantify the improvement
it brings at one single site is not in terms of cycles, but in terms of
number of cache lines used by the scheduler code. Since memory bandwidth
seems to be an increasing bottleneck (CPU frequency increases faster
than the available memory bandwidth), it makes sense to free as much
cache lines as we can.

Measuring the overall impact on the system of this single modification
results in the difference brought by one site within the standard
deviation of the normal samples. It will become significant when the
number of immediate values used instead of global variables at hot
kernel paths (need to ponder with the frequency at which the data is
accessed) will start to be significant compared to the L1 data cache
size. We could characterize this in memory to L1 cache transfers per
seconds.

On 3GHz P4:

memory read: ~48 cycles

So we can definitely say that 48*HZ (approximation of the frequency at
which the scheduler is called) won't make much difference, but as it
grows, it will.

On a 1000HZ system, it results in:

48000 cycles/second, or 16µs/second, or 0.000016% speedup.

However, if we place this in code called much more often, such as
do_page_fault, we get, with an hypotetical scenario of approximation
of 100000 page faults per second:

4800000 cycles/s, 1.6ms/second or 0.0016% speedup.

So as the number of immediate values used increase, the overall memory
bandwidth required by the kernel will go down.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ