linux-kernel - Re: [RFC PATCH 0/3] Implement getcpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <467525713.343916.1452604549209.JavaMail.zimbra@efficios.com>
Date:	Tue, 12 Jan 2016 13:15:49 +0000 (UTC)
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Ben Maurer <bmaurer@...com>
Cc:	Josh Triplett <josh@...htriplett.org>,
	Shane M Seymour <shane.seymour@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Paul Turner <pjt@...gle.com>, Andrew Hunter <ahh@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org,
	linux-api <linux-api@...r.kernel.org>,
	Andy Lutomirski <luto@...capital.net>,
	Andi Kleen <andi@...stfloor.org>,
	Dave Watson <davejwatson@...com>, Chris Lameter <cl@...ux.com>,
	Ingo Molnar <mingo@...hat.com>, rostedt <rostedt@...dmis.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Russell King <linux@....linux.org.uk>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>,
	Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [RFC PATCH 0/3] Implement getcpu_cache system call

----- On Jan 11, 2016, at 11:27 PM, Ben Maurer bmaurer@...com wrote:

> One disadvantage of only allowing one is that high performance server
> applications tend to statically link. It'd suck to have to go through what ever
> type of relocation we'd need to pull this out of glibc. But if there's only one
> registration allowed a statically linked app couldn't create its own if glibc
> might use it some day.

One idea I have would be to let the kernel reserve some space either after the
first stack address (for a stack growing down) or at the beginning of the
allocated TLS area for each thread in copy_thread_tls() by fiddling with
sp or the tls base address when creating a thread.

In theory, this would allow always returning the same address, and the memory
would exist as long as the thread exists.

Not sure whether it may have unforeseen impact though.

Thoughts ?

Thanks,

Mathieu

> 
> 
> 
> Sent from my iPhone
> 
>> On Jan 11, 2016, at 6:46 PM, Josh Triplett <josh@...htriplett.org> wrote:
>> 
>>> On Tue, Jan 12, 2016 at 12:49:18AM +0000, Mathieu Desnoyers wrote:
>>> ----- On Jan 11, 2016, at 6:03 PM, Josh Triplett josh@...htriplett.org wrote:
>>> 
>>>>> On Mon, Jan 11, 2016 at 10:38:28PM +0000, Seymour, Shane M wrote:
>>>>> I have some concerns and suggestions for you about this.
>>>>> 
>>>>> What's to stop someone in user space from requesting an arbitrarily large number
>>>>> of CPU # cache locations that the kernel needs to allocate memory to track and
>>>>> each time the task migrates to a new CPU it needs to update them all? Could you
>>>>> use it to dramatically slow down a system/task switching? Should there be a
>>>>> ulimit type value or a sysctl setting to limit the number that you're allowed
>>>>> to register per-task?
>>>> 
>>>> The documented behavior of the syscall allows only one location per
>>>> thread, so the kernel can track that one and only address rather easily
>>>> in the task_struct.  Allowing dynamic allocation definitely doesn't seem
>>>> like a good idea.
>>> 
>>> The current implementation now allows more than one location per
>>> thread. Which piece of documentation states that only one location
>>> per thread is allowed ? This was indeed the case for the prior
>>> implementations, but I moved to implementing a linked-list of
>>> cpu_cache areas per thread to allow the getcpu_cache system call to
>>> be used by more than a single shared object within a given program.
>> 
>> Ah, I missed that change.
>> 
>>> Without the linked list, as soon as more than one shared object try
>>> to register their cache, the first one will prohibit all others from
>>> doing so.
>>> 
>>> We could perhaps try to document that this system call should only
>>> ever be used by *libc, and all libraries and applications should
>>> then use the libc TLS cache variable, but it seems rather fragile,
>>> and any app/lib could try to register its own cache.
>> 
>> That does seem a bit fragile, true; on the other hand, the linked-list
>> approach would allow userspace to allocate an unbounded amount of kernel
>> memory, without any particular control on it.  That doesn't seem
>> reasonable.  Introducing an rlimit or similar for this seems like
>> massive overkill, and hardcoding a fixed limit breaks the 0-1-infinity
>> rule.
>> 
>> Given that any registered location will always provide the same value,
>> allowing only a single registration doesn't seem *too* problematic;
>> libc-based programs can use the libc implementation, and non-libc-based
>> programs can register a location themselves.  And users of this API will
>> already likely want to use some TLS mechanism, which already interacts
>> heavily with libc (set_thread_area/clone).
>> 
>> Allowing only one registration at a time seems preferable to introducing
>> another way to allocate kernel resources on a process's behalf.
>> 
> > - Josh Triplett

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com