linux-kernel - Re: [patch 1/9] Conditional Calls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070613155724.GA8703@Krystal>
Date:	Wed, 13 Jun 2007 11:57:24 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Adrian Bunk <bunk@...sta.de>
Cc:	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 1/9] Conditional Calls - Architecture Independent Code

Hi Adrian,

* Adrian Bunk (bunk@...sta.de) wrote:

> I have two questions for getting the bigger picture:
>
> 1. How much code will be changed?
> Looking at the F00F bug fixup example, it seems we'll have to make
> several functions in every single driver conditional in the kernel for
> getting the best performance.
> How many functions to you plan to make conditional this way?
>

I just changed the infrastructure to match Andi's advice : the
cond_calls are now "fancy" variables : they refer to a static variable
address, and every update (which must be done through the cond call API)
changes every load immediate referring to this variable. Therefore, they
can be simply embedded in a if(cond_call(var)) statement, so there is no
big code change to do.

> 2. What is the real-life performance improvement?
> That micro benchmarks comparing cache hits with cache misses give great 
> looking numbers is obvious.
> But what will be the performance improvement in real workloads after the
> functions you plan to make conditional according to question 1 have been 
> made conditional?
> 

Hrm, I am trying to get interesting numbers out of lmbench: I just ran a
test on a kernel sprinkled with about 50 markers at important sites
(LTTng markers: system call entry/exit, traps, interrupt handlers, ...).
The markers are compiled-in, but in "disabled state". Since the markers
re-use the cond_call infrastructure, each marker has its own cond_call.

I ran the test in two situations on my Pentium 4 box:

1 - Cond call optimizations are disabled. This is the equivalent of
using a global variable (in the kernel data) as a condition for the
branching.

2 - Cond call optimizations are enabled. It uses the load immediate
(which is now loading an integer on x86 instead of a char, to make sure
there is no pipeline stall due to false register dependency).

The results are that we really cannot tell that one is faster/slower
than the other; the standard deviation is much higher than the
difference between the two situations.

Note that lmbench is a workload that will not trigger much L1 cache
stress, since it repeats the same tests many times. Do you have any
suggestion of a test that would be more representative of a real
diversified (in term of in-kernel locality of reference) workload ?

Thanks,

Mathieu

> TIA
> Adrian
> 
> -- 
> 
>        "Is there not promise of rain?" Ling Tan asked suddenly out
>         of the darkness. There had been need of rain for many days.
>        "Only a promise," Lao Er said.
>                                        Pearl S. Buck - Dragon Seed
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/