lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <456372AD.5080807@goop.org>
Date:	Tue, 21 Nov 2006 13:42:05 -0800
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Eric Dumazet <dada1@...mosbay.com>
CC:	Andi Kleen <ak@...e.de>, Ingo Molnar <mingo@...e.hu>,
	akpm@...l.org, Arjan van de Ven <arjan@...radead.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] i386-pda UP optimization

Eric Dumazet wrote:
> I did *lot* of reboots of my Dell D610 machine, with some trivial benchmarks 
> using : pipe/write()/read, umask(), or getppid(), using or not oprofile.
>
> I managed to avoid reloading %gs in sysenter_entry .
> (avoiding the two instructions : movl $(__KERNEL_PDA), %edx; movl %edx, %gs
>
> I could not avoid reloading %gs in system_call, I dont know why, but modern 
> glibc use sysenter so I dont care :)
>
> I confirm I got better results with my patched kernel in all tests I've done.
>
> umask : 12.64 s instead of 12.90 s
> getppid : 13.37 s instead of 13.72 s
> pipe/read/write : 9.10 s instead of 9.52 s
>
> (I got very different results in umask() bench, patching it not to use xchg(), 
> since this instruction is expensive on x86 and really change oprofile 
> results. I will submit a patch for this.
>   

Could you go into more detail about what you're actually measuring
here?  Is it 10,000,000 loops of the single syscall?  pipe/read/write
suggests that you're doing at least 2 syscalls per loop, but it takes
the smallest elapsed time.

What are you using as your time reference?  Real time?  Process time?

For umask/getppid, assuming you're just running 1e7 iterations, you're
seeing a difference of 25 and 35ns per iteration difference.  I wonder
why it would be different for different syscalls; I would expect it to
be a constant overhead either way.  Certainly these numbers are much
larger than I saw when I benchmarked pda-vs-nopda using lmbench's null
syscall (getppid) test; I saw an overall 9ns difference in null syscall
time on my Core Duo run at 1GHz.  What's your CPU and speed?

One possibility is a cache miss on the gdt while reloading %gs.  I've
been planning on a patch to rearrange the gdt in order to pack all the
commonly used segment descriptors into one or two cache lines so that
all the segment register reloads can be done with a minimum of cache
misses.  It would be interesting for you to replace the:

    movl $(__KERNEL_PDA), %edx; movl %edx, %gs

with an appropriate read of the gdt entry, hm, which is a bit complex to
find.

    J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ