linux-kernel - Re: [patch] speed up / fix the new generic semaphore code (fix AIM7 40% regression with 2.6.26-rc1)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.10.0805081610350.2940@woody.linux-foundation.org>
Date:	Thu, 8 May 2008 16:14:09 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Ingo Molnar <mingo@...e.hu>
cc:	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>,
	Andi Kleen <andi@...stfloor.org>,
	Matthew Wilcox <matthew@....cx>,
	LKML <linux-kernel@...r.kernel.org>,
	Alexander Viro <viro@....linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>
Subject: Re: [patch] speed up / fix the new generic semaphore code (fix AIM7
 40% regression with 2.6.26-rc1)



On Thu, 8 May 2008, Linus Torvalds wrote:
> 
> Btw, sparse will complain about those, because the source code *looks* 
> really cheap.

Sometimes you can fix it.

For example, this change:

	-       if (pte_present(*pte) && page_to_pfn(page) == pte_pfn(*pte)) {
	+       if (pte_present(*pte) && page == pfn_to_page(pte_pfn(*pte))) {

can simplify things: instead of moving from a 'struct page' to a pfn, it 
moves from a pfn to a 'struct page', and that is generally cheaper 
(multiply rather than divide by size of struct page). It's not always the 
same thing to do, but I think in this case we can. For me, the code 
generation changes:

	-       movabsq $7905747460161236407, %rdx      #, tmp111
	-       movabsq $32985348833280, %rax   #, tmp107
	-       leaq    (%r12,%rax), %rax       #, tmp106
	-       sarq    $3, %rax        #, tmp106
	-       imulq   %rdx, %rax      # tmp111, tmp106
	-       movabsq $70368744177663, %rdx   #, tmp113
	-       andq    %rdx, %rcx      # tmp113, pte$pte
	-       shrq    $12, %rcx       #, pte$pte
	-       cmpq    %rcx, %rax      # pte$pte, tmp106
	+       movabsq $70368744177663, %rax   #, tmp107
	+       andq    %rax, %rdx      # tmp107, pte$pte
	+       shrq    $12, %rdx       #, pte$pte
	+       imulq   $56, %rdx, %rax #, pte$pte, tmp109
	+       movabsq $-32985348833280, %rdx  #, tmp111
	+       addq    %rdx, %rax      # tmp111, tmp110
	+       cmpq    %rax, %r13      # tmp110, page

which isn't a *huge* deal, but it certainly looks better. One less big 
constant, and one less shift.

It's not going to make a huge difference, though. That function is just 
called too much, and it would still be entirely data-dependent all the way 
through.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/