[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53CEC8AC.7020700@hp.com>
Date: Tue, 22 Jul 2014 16:25:16 -0400
From: Waiman Long <waiman.long@...com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Darren Hart <dvhart@...ux.intel.com>,
Andy Lutomirski <luto@...capital.net>,
Andi Kleen <andi@...stfloor.org>,
Ingo Molnar <mingo@...nel.org>,
Davidlohr Bueso <davidlohr@...com>,
Heiko Carstens <heiko.carstens@...ibm.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Linux API <linux-api@...r.kernel.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
Jason Low <jason.low2@...com>,
Scott J Norton <scott.norton@...com>,
Robert Haas <robertmhaas@...il.com>
Subject: Re: [RFC PATCH 0/5] futex: introduce an optimistic spinning futex
On 07/22/2014 05:59 AM, Thomas Gleixner wrote:
> On Tue, 22 Jul 2014, Peter Zijlstra wrote:
>> On Tue, Jul 22, 2014 at 10:39:17AM +0200, Thomas Gleixner wrote:
>>> On Tue, 22 Jul 2014, Peter Zijlstra wrote:
>>>> Anyway, there is one big fail in the entire futex stack that we 'need'
>>>> to sort some day and that is NUMA. Some people (again database people)
>>>> explicitly do not use futexes and instead use sysvsem because of this.
>>>>
>>>> The problem with numa futexes is that because they're vaddr based there
>>>> is no (persistent) node information. You always end up having to fall
>>>> back to looking in all nodes before you can guarantee there is no
>>>> matching futex.
>>>>
>>>> One way to achieve it is by extending the futex value to include a node
>>>> number, but that's obviously a complete ABI break. Then again, it should
>>>> be pretty straight fwd, since the node number doesn't need to be part of
>>>> the actual atomic update part, just part of the userspace storage.
>>> So you want per node hash buckets, right? Fair enough, but how do you
>>> make sure, that no thread/process on a different node is fiddling with
>>> that "node bound" futex as well?
>> You don't and that should work just as well, just slower. But since the
>> node id is in the futex 'value' we'll always end up in the right
>> node-hash, even if its a remote one.
>>
>> So yes, per node hashes, and a persistent futex->node map.
> Which works fine as long as you only have the futex_q on the stack of
> the blocked task. If user space is lying to you, then you just end up
> with a bunch of threads sleeping forever. Who cares?
>
> But if you create independent kernel state, which we have with
> pi_state and which you need for finegrained locking and further
> spinning fun, you open up another can of worms. Simply because this
> would enable rogue user space to create multiple instances of the
> kernel internal state. I can predict the CVEs resulting from that
> even without using a crystal ball.
>
> Thanks,
>
> tglx
I think NUMA futex, if implemented, is a completely independent piece
that have no direct relationship with optimistic spinning futex. It
should be a separate patch and not mixing with optimistic spinning patch
which will only make the latter one more complicated.
-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists