linux-kernel - Re: [RFC] NUMA futex hashing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 8 Aug 2006 07:39:57 -0700
From:	"Ulrich Drepper" <drepper@...il.com>
To:	"Eric Dumazet" <dada1@...mosbay.com>
Cc:	"Andi Kleen" <ak@...e.de>,
	"Ravikiran G Thirumalai" <kiran@...lex86.org>,
	"Shai Fultheim (Shai@...lex86.org)" <shai@...lex86.org>,
	"pravin b shelar" <pravin.shelar@...softinc.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC] NUMA futex hashing

On 8/8/06, Eric Dumazet <dada1@...mosbay.com> wrote:
> The validity of the virtual address is still tested by normal get_user()
> call.. If the memory was freed by a thread, then a normal EFAULT error will
> be reported... eventually.

This is indeed what should be done.  Private futexes are the by far
more frequent case and I bet you'd see improvements when avoiding the
mm mutex even for normal machines since futexes really are everywhere.
 For shared mutexes you end up doing two lookups and that's fine IMO
as long as the first lookup is fast.

As for the NUMA case, I would oppose any change which has the
slightest impact on non-NUMA machines.  It cannot be allowed that the
majority of systems is slowed down significantly just because of NUMA.
 Especially since the effects of NUMA beside cache line transfer
penalties IMO probably are neglect able.  The in-kernel futex
representation only exists when there are waiters and so the memory
needed is only allocated when we a waiting.  In this case it just be
easy enough to use local memory.  But this unlikely will help much
since the waker thread is ideally not on the same processor, maybe not
even on the same node.  So there will be cacheline transfers in most
cases and everything possible improvement will be minimal and maybe
even not generally measurable.

If you want to do anything in this area, first remove the global
mutex.  Then really measure with real world application.  And I don't
mean specially designed HPC apps which assign threads/processes to
processors or nodes.  Those are special cases of a special case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/