linux-kernel - Re: [RFC 00/15] x86_64: Optimize percpu accesses

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48764B58.5040209@goop.org>
Date:	Thu, 10 Jul 2008 10:48:08 -0700
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Christoph Lameter <cl@...ux-foundation.org>
CC:	"H. Peter Anvin" <hpa@...or.com>, Mike Travis <travis@....com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Arjan van de Ven <arjan@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jack Steiner <steiner@....com>, linux-kernel@...r.kernel.org,
	Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: [RFC 00/15] x86_64: Optimize percpu accesses

Christoph Lameter wrote:
> Jeremy Fitzhardinge wrote:
>   
>> The base address of the percpu area and the offsets from that base are
>> completely independent values.
>>     
>
> Definitely.
>
>
>   
>> The addressing modes:
>>
>>    * ABS
>>    * off(%rip)
>>
>> Are exactly equivalent in what offsets they can generate, so long as *at
>> link time* the percpu *symbols* are within 2G of the code addressing
>> them.  *After* the addressing mode has generated an effective address
>> (by whatever means it likes), the %gs: override applies the segment
>> base, which can therefore offset the effective address to anywhere at all.
>>     
>
> Right. The problem is with the percpu area handled by the linker. That percpu area is used by the boot cpu and later we setup other additional per cpu areas. Those can be placed in an arbitrary way if one goes through a table of pointers to these areas.
>   

Yes, but the offset is the same either way.  When you want a cpu to 
refer to its own percpu memory, regardless of where it is in memory, you 
just reload the gs base.  The offsets are the same everywhere, and are 
computed by the linker with out knowledge or reference to where the 
final address will end up.

In other words, at source level:

	a = x86_read_percpu(foo)

will generate

	mov %gs:percpu__foo, %rax

where the linker decides the value of percpu__foo, which can be up to 
4G.  Or if we use rip-relative:

	mov %gs:percpu__foo(%rip), %rax

we end up with the same result, except that the generated instruction is 
a bit more compact.

In the final generated assembly, it ends up being a hardcoded constant 
address.  Say, 0x7838.

Now if we allocate cpu 43 percpu data at 0xfffffffff7198000, we load %gs 
base with that value, and then the instruction is still

	mov %gs:0x7838, %rax

and the computed address will be 0xfffffffff7198000 + 0x7838 = 
0xfffffffff719f838.

And cpu 62 has its percpu data at 0xffffffffe3819000, and the 
instruction is still

	mov %gs:0x7838, %rax

and the computed address for it's version of percpu__foo is 
0xffffffffe3819000 + 0x7838 = 0xffffffffe3820838.

Note that it doesn't matter how you decide to place the percpu data, so 
long as you can load the address into the %gs base.

> However, that does not work if one calculates the virtual address instead of looking up a physical address.
>   

Calculate a virtual address for what?  Physical address for what?  If 
you have a large virtual region allocating 256M of percpu space, er, per 
cpu, then you just load %gs base with percpu_region_base + cpuid * 
256M.  It has no effect on the instructions accessing that percpu space.

    J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/