linux-kernel - Re: [PATCH 09/10] percpu: implement new dynamic percpu allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 26 Feb 2009 12:17:52 +0900
From:	Tejun Heo <tj@...nel.org>
To:	"Luck, Tony" <tony.luck@...el.com>
CC:	Rusty Russell <rusty@...tcorp.com.au>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"x86@...nel.org" <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"hpa@...or.com" <hpa@...or.com>,
	"jeremy@...p.org" <jeremy@...p.org>, "cpw@....com" <cpw@....com>,
	"mingo@...e.hu" <mingo@...e.hu>
Subject: Re: [PATCH 09/10] percpu: implement new dynamic percpu allocator

Hello,

Luck, Tony wrote:
> ia64 started out with a pinned TLB entry to map the percpu space to the
> top 64K of address space (so that the compiler can generate ld/st instructions
> with a small negative offset from register r0 to access local-to-this-cpu
> objects).
> 
> Then we started using a one of the ar.k* registers to hold the base
> physical address for each cpus per-cpu area so that early parts of
> machine check code (which runs with MMU off) can access per-cpu variables.
> 
> Finally we found that certain transaction processing benchmarks ran faster
> if we let the cpu have free access to one extra TLB entry ... so we
> stopped pinning the per-cpu area, and wrote a s/w fault handler to
> insert the mapping on demand (using the ar.k3 register to get the
> physical address for the mapping).
> 
> N.B. ar.k3 is a medium-slow register ... I wouldn't want to use it
> in the code sequence for *every* per-cpu variable access.

Ah... I see, so the 64k limit for small offset.  I think what we can
do is using the first chunk for static percpu variables.  We'll still
be able to use the same accessor by doing something like...

#define unified_percpu_accessor(ptr) ({ \
	if (__builtin_constant_p(ptr)) \
		return r0 - unit_size + ptr; \
	else \
		do ar.k3 + ptr; \
	})

So, dynamic ones will be slower than normal ones but faster than what
we currently have (it will be faster than indirect pointer
derferencing, right?) while keeping static accesses fast.  Does it
sound okay to you?  Also, does anyone know whether there's a working
ia64 emulator?  There doesn't seem to be any and it seems almost
impossible to get hold of an actual ia64 machine over here.  :-(

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/