linux-kernel - Re: [patch v2] epoll use a single inode ...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200703061910.13431.dada1@cosmosbay.com>
Date:	Tue, 6 Mar 2007 19:10:13 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	"H. Peter Anvin" <hpa@...or.com>,
	Davide Libenzi <davidel@...ilserver.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [patch v2] epoll use a single inode ...

On Tuesday 06 March 2007 18:28, Eric Dumazet wrote:
> On Tuesday 06 March 2007 18:19, Linus Torvalds wrote:
>
> > > Using reciprocal divides permits to change each divide by two
> > > multiplies, less expensive on current CPUS.
> >
> > Are you sure?
>
> I am going to test this, but at least on Opterons, the reciprocal divide I
> added into mm/slab.c gave me a nice speedup.
>

With attached test program (one million calls to pipe()), I got about 0.1 s of 
speedup on my 1.6 GHz Pentium(M) dell D610 machine
(3.350 s instead of 3.450 s, on many runs)

Thats about 100 ns saved per number() call

But then I realized that on ia32, gcc compilers is not very smart :

static inline u32 reciprocal_divide(u32 A, u32 R)
{
	return (u32)(((u64)A * R) >> 32); 
}

It generates two multiplies... arg...

//begin of reciprocal_divide()
     4b0:       8b 4c 24 28             mov    0x28(%esp),%ecx
     4b4:       89 f0                   mov    %esi,%eax
     4b6:       f7 64 24 24             mull   0x24(%esp)
     4ba:       0f af ce                imul   %esi,%ecx
     4bd:       8d 14 11                lea    (%ecx,%edx,1),%edx
// end of reciprocal_divide()
     4c0:       8b 8c 24 8c 00 00 00    mov    0x8c(%esp),%ecx
     4c7:       89 d0                   mov    %edx,%eax
     4c9:       0f af c8                imul   %eax,%ecx
     4cc:       29 ce                   sub    %ecx,%esi
     4ce:       8b 4c 24 1c             mov    0x1c(%esp),%ecx
     4d2:       0f b6 34 31             movzbl (%ecx,%esi,1),%esi
     4d6:       89 f1                   mov    %esi,%ecx
     4d8:       89 c6                   mov    %eax,%esi
     4da:       88 0f                   mov    %cl,(%edi)
     4dc:       47                      inc    %edi
     4dd:       ff 44 24 20             incl   0x20(%esp)
     4e1:       85 c0                   test   %eax,%eax
     4e3:       75 cb                   jne    4b0 <number+0x160>

So even with a total of 3 multiplies per digit, we win...

Maybe some bit of x86 asm is needed to make gcc be smarter (using only one 
multiply for  reciprocal_divide())


View attachment "pipes.c" of type "text/plain" (242 bytes)