linux-kernel - Re: [rfc 08/45] cpu alloc: x86 support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0711211051230.2468@schroedinger.engr.sgi.com>
Date:	Wed, 21 Nov 2007 11:01:43 -0800 (PST)
From:	Christoph Lameter <clameter@....com>
To:	Andi Kleen <ak@...e.de>
cc:	akpm@...ux-foundation.org, travis@....com,
	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	linux-kernel@...r.kernel.org
Subject: Re: [rfc 08/45] cpu alloc: x86 support

On Wed, 21 Nov 2007, Andi Kleen wrote:

> The whole mapping for all CPUs cannot fit into 2GB of course, but the reference 
> linker managed range can.

Ok so you favor the solution where we subtract smp_processor_id() << 
shift?

> > The offset relative to %gs cannot be used if you have a loop and are 
> > calculating the addresses for all instances. That is what we are talking 
> > about. The CPU_xxx operations that are using the %gs register are fine and 
> > are not affected by the changes we are discussing.
> 
> Sure it can -- you just get the base address from a global array
> and then add the offset

Ok so generalize the data_offset for that case? I noted that other arches 
and i386 have a similar solution there. I fiddled around some more and 
found that the overhead that the subtraction introduces is equivalent to 
loading an 8 byte constant of the base.

Keeping the usage of data_offset can avoid the shift and the add for the 
__get_cpu_var case that needs CPU_PTR( ..., smp_processor_id()) because 
the load from data_offset avoid the shifting and adding of 
smp_processor_id().

For the loops this is not useful since the compiler can move the 
loading of the base pointer outside of the loop )if CPU_PTR needs to load 
an 8 byte constant pointers).

With loading the 8 byte base the loops actually become:

sum = 0
ptr = CPU_AREA_BASE
while base < NR_CPUS << shift {
	sum = *ptr
	ptr += 1 << shift
}

So I think we need to go with the implementation where CPU_PTR(var, cpu) 
is

CPU_AREA_BASE + cpu << shift + var_offset

The CPU_AREA_BASE will be loaded into a register. The var_offset usually 
ends up being an offset in a mov instruction.

> > 
> > > Then the reference data would be initdata and eventually freed.
> > > That is similar to how the current per cpu data works.
> > 
> > Yes that is also how the current patchset works. I just do not understand 
> > what you want changed.
> 
> Anyways i think your current scheme cannot work (too much VM, placed at the wrong 
> place; some wrong assumptions).

The constant pointer solution fixes that. No need to despair.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/