lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 8 Aug 2006 07:39:57 -0700
From:	"Ulrich Drepper" <drepper@...il.com>
To:	"Eric Dumazet" <dada1@...mosbay.com>
Cc:	"Andi Kleen" <ak@...e.de>,
	"Ravikiran G Thirumalai" <kiran@...lex86.org>,
	"Shai Fultheim (Shai@...lex86.org)" <shai@...lex86.org>,
	"pravin b shelar" <pravin.shelar@...softinc.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC] NUMA futex hashing

On 8/8/06, Eric Dumazet <dada1@...mosbay.com> wrote:
> The validity of the virtual address is still tested by normal get_user()
> call.. If the memory was freed by a thread, then a normal EFAULT error will
> be reported... eventually.

This is indeed what should be done.  Private futexes are the by far
more frequent case and I bet you'd see improvements when avoiding the
mm mutex even for normal machines since futexes really are everywhere.
 For shared mutexes you end up doing two lookups and that's fine IMO
as long as the first lookup is fast.

As for the NUMA case, I would oppose any change which has the
slightest impact on non-NUMA machines.  It cannot be allowed that the
majority of systems is slowed down significantly just because of NUMA.
 Especially since the effects of NUMA beside cache line transfer
penalties IMO probably are neglect able.  The in-kernel futex
representation only exists when there are waiters and so the memory
needed is only allocated when we a waiting.  In this case it just be
easy enough to use local memory.  But this unlikely will help much
since the waker thread is ideally not on the same processor, maybe not
even on the same node.  So there will be cacheline transfers in most
cases and everything possible improvement will be minimal and maybe
even not generally measurable.

If you want to do anything in this area, first remove the global
mutex.  Then really measure with real world application.  And I don't
mean specially designed HPC apps which assign threads/processes to
processors or nodes.  Those are special cases of a special case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ