lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 21 Nov 2007 11:01:43 -0800 (PST) From: Christoph Lameter <clameter@....com> To: Andi Kleen <ak@...e.de> cc: akpm@...ux-foundation.org, travis@....com, Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>, linux-kernel@...r.kernel.org Subject: Re: [rfc 08/45] cpu alloc: x86 support On Wed, 21 Nov 2007, Andi Kleen wrote: > The whole mapping for all CPUs cannot fit into 2GB of course, but the reference > linker managed range can. Ok so you favor the solution where we subtract smp_processor_id() << shift? > > The offset relative to %gs cannot be used if you have a loop and are > > calculating the addresses for all instances. That is what we are talking > > about. The CPU_xxx operations that are using the %gs register are fine and > > are not affected by the changes we are discussing. > > Sure it can -- you just get the base address from a global array > and then add the offset Ok so generalize the data_offset for that case? I noted that other arches and i386 have a similar solution there. I fiddled around some more and found that the overhead that the subtraction introduces is equivalent to loading an 8 byte constant of the base. Keeping the usage of data_offset can avoid the shift and the add for the __get_cpu_var case that needs CPU_PTR( ..., smp_processor_id()) because the load from data_offset avoid the shifting and adding of smp_processor_id(). For the loops this is not useful since the compiler can move the loading of the base pointer outside of the loop )if CPU_PTR needs to load an 8 byte constant pointers). With loading the 8 byte base the loops actually become: sum = 0 ptr = CPU_AREA_BASE while base < NR_CPUS << shift { sum = *ptr ptr += 1 << shift } So I think we need to go with the implementation where CPU_PTR(var, cpu) is CPU_AREA_BASE + cpu << shift + var_offset The CPU_AREA_BASE will be loaded into a register. The var_offset usually ends up being an offset in a mov instruction. > > > > > Then the reference data would be initdata and eventually freed. > > > That is similar to how the current per cpu data works. > > > > Yes that is also how the current patchset works. I just do not understand > > what you want changed. > > Anyways i think your current scheme cannot work (too much VM, placed at the wrong > place; some wrong assumptions). The constant pointer solution fixes that. No need to despair. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists