lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 26 Feb 2009 12:17:52 +0900
From:	Tejun Heo <tj@...nel.org>
To:	"Luck, Tony" <tony.luck@...el.com>
CC:	Rusty Russell <rusty@...tcorp.com.au>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"x86@...nel.org" <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"hpa@...or.com" <hpa@...or.com>,
	"jeremy@...p.org" <jeremy@...p.org>, "cpw@....com" <cpw@....com>,
	"mingo@...e.hu" <mingo@...e.hu>
Subject: Re: [PATCH 09/10] percpu: implement new dynamic percpu allocator

Hello,

Luck, Tony wrote:
> ia64 started out with a pinned TLB entry to map the percpu space to the
> top 64K of address space (so that the compiler can generate ld/st instructions
> with a small negative offset from register r0 to access local-to-this-cpu
> objects).
> 
> Then we started using a one of the ar.k* registers to hold the base
> physical address for each cpus per-cpu area so that early parts of
> machine check code (which runs with MMU off) can access per-cpu variables.
> 
> Finally we found that certain transaction processing benchmarks ran faster
> if we let the cpu have free access to one extra TLB entry ... so we
> stopped pinning the per-cpu area, and wrote a s/w fault handler to
> insert the mapping on demand (using the ar.k3 register to get the
> physical address for the mapping).
> 
> N.B. ar.k3 is a medium-slow register ... I wouldn't want to use it
> in the code sequence for *every* per-cpu variable access.

Ah... I see, so the 64k limit for small offset.  I think what we can
do is using the first chunk for static percpu variables.  We'll still
be able to use the same accessor by doing something like...

#define unified_percpu_accessor(ptr) ({ \
	if (__builtin_constant_p(ptr)) \
		return r0 - unit_size + ptr; \
	else \
		do ar.k3 + ptr; \
	})

So, dynamic ones will be slower than normal ones but faster than what
we currently have (it will be faster than indirect pointer
derferencing, right?) while keeping static accesses fast.  Does it
sound okay to you?  Also, does anyone know whether there's a working
ia64 emulator?  There doesn't seem to be any and it seems almost
impossible to get hold of an actual ia64 machine over here.  :-(

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ