lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 27 Aug 2006 19:21:55 +0200
From:	Andreas Mohr <andi@...x01.fht-esslingen.de>
To:	Jeremy Fitzhardinge <jeremy@...p.org>
Cc:	linux-kernel@...r.kernel.org,
	Chuck Ebbert <76306.1226@...puserve.com>,
	Zachary Amsden <zach@...are.com>,
	Jan Beulich <jbeulich@...ell.com>, Andi Kleen <ak@...e.de>,
	Andrew Morton <akpm@...l.org>
Subject: Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.

Hi,

On Sun, Aug 27, 2006 at 01:44:17AM -0700, Jeremy Fitzhardinge wrote:
> This patch implements per-processor data areas by using %gs as the
> base segment of the per-processor memory.  This has two principle
> advantages:
> 
> - It allows very simple direct access to per-processor data by
>   effectively using an effective address of the form %gs:offset, where
>   offset is the offset into struct i386_pda.  These sequences are faster
>   and smaller than the current mechanism using current_thread_info().

Yess!!
Something like that had to be done eventually about the inefficient
current_thread_info() mechanism, but I wasn't sure what exactly.

> I haven't measured performance yet, but when using the PDA for "current"
> and "smp_processor_id", I see a 5715 byte reduction in .text segment
> size for my kernel.

This is interesting, since even by doing a non-elegant
current->... --> struct task_struct *tsk = current;
replacement for excessive uses of current, I was able to gain almost 10kB
within a single file already!
I guess it's due to having tried that on an older installation with gcc 3.2,
which probably does less efficient opcode merging of current_thread_info()
requests compared to a current gcc version.
IOW, .text segment reduction could be quite a bit higher for older gcc:s.

> This uses the x86 segmentation stuff in a way similar to NPTL's way of
> implementing Thread-Local Storage.  It relies on the fact that each CPU
> has its own Global Descriptor Table (GDT), which is basically an array
> of base-length pairs (with some extra stuff).  When a segment register
> is loaded with a descriptor (approximately, an index in the GDT), and
> you use that segment register for memory access, the address has the
> base added to it, and the resulting address is used.

Not a problem for more daring user-space apps (i.e. Wine), I hope?

Andreas Mohr

-- 
No programming skills!? Why not help translate many Linux applications! 
https://launchpad.ubuntu.com/rosetta
(or alternatively buy nicely packaged Linux distros/OSS software to help
support Linux developers creating shiny new things for you?)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists