linux-kernel - Re: [PATCH 1/2] bitreversal program

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4834035D.5090703@keyaccess.nl>
Date:	Wed, 21 May 2008 13:11:25 +0200
From:	Rene Herman <rene.herman@...access.nl>
To:	Soumyadip Das Mahapatra <kernelhacker@...ualserver.org>
CC:	Benoit Boissinot <bboissin@...il.com>,
	Akinobu Mita <akinobu.mita@...il.com>,
	Harvey Harrison <harvey.harrison@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] bitreversal program

On 21-05-08 10:54, Soumyadip Das Mahapatra wrote:

> Sorry to disturb you again. But i tested my code against Akinobu's one
> and the test result shows my code takes less cpu time than that of
> Akinobu's.

The unfortunate thing about these kinds of changes is that they're not 
all that easily tested. Straightforwardness would suggest that obviously 
the current table driven method will be faster due to needing fewer code 
cycles. Cache considerations add to that in the sense of instruction 
cache and can (!) detract from it in the sense of data cache; sometimes 
dramaticaly detract due to cache misses basically dwarving most anything 
else.

However, in this case the table is a tiny 256-byte one which isn't even 
going to be pulled in completely in normal usage (just the cache-lines 
needed) while on the other hand the extra i-cache pressure from the 
increased code in your version is always there.

It's unexpected that you would get better results from your new code 
(and I'm not; I took Benoit's posted test and get 15 seconds for your 
version versus 9 for the original table-driven one) and in this case, 
reality wouldn't contradict the micro-benchmark either. It's when the 
table grows and, especially, more of it is needed on a regular basis 
that you'd start to worry.

PS: If you're going to go really micro, there are even going to be 
differences between bitreversing 0x00000000 which is just going to need 
the first byte (hence cacheline) and say 0x004080c0 which is going to 
occupy 4 cachelines. Again not in the isolated test though; the data in 
this case is small enough that you should be having a hard time getting 
your version to perform better -- forking off a competing process that 
does its best to dirty cache might do it, but then you're in a situation 
which is no longer real-world with respect to this "call once" bit of API...

Rene.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/