lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45637940.1090608@goop.org>
Date:	Tue, 21 Nov 2006 14:10:08 -0800
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Andi Kleen <ak@...e.de>
CC:	Eric Dumazet <dada1@...mosbay.com>, Ingo Molnar <mingo@...e.hu>,
	akpm@...l.org, Arjan van de Ven <arjan@...radead.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] i386-pda UP optimization

Andi Kleen wrote:
>> For umask/getppid, assuming you're just running 1e7 iterations, you're
>> seeing a difference of 25 and 35ns per iteration difference.  I wonder
>> why it would be different for different syscalls; I would expect it to
>> be a constant overhead either way.
>>     
>
> They got different numbers of current references? 
>   

My understanding is that Eric has changed UP current (and other PDA ops)
to not touch %gs at all, and the difference in reported times in due
omitting the %gs load in entry.S (though %gs is still save/restored on
the stack).

> On such micro benchmarks everything should be cache hot in theory
> (unless it's a system with really small cache)
>   

Yes, that would be my thought too, but maybe there's excessive aliasing
on one of the ways, but I think he's using a Pentium M which has a 8-way L1.

>> been planning on a patch to rearrange the gdt in order to pack all the
>> commonly used segment descriptors into one or two cache lines so that
>> all the segment register reloads can be done with a minimum of cache
>> misses.  It would be interesting for you to replace the:
>>
>>     movl $(__KERNEL_PDA), %edx; movl %edx, %gs
>>
>> with an appropriate read of the gdt entry, hm, which is a bit complex to
>> find.
>>     
>
> On UP it could be hardcoded. And oprofile can be used to profile for cache misses.
>   

Yes, assuming oprofile doesn't interfere with things too much. 
Actually, just counting cache miss events during the course of a syscall
would be most interesting (ie, no need to sample).


    J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ