lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <D0DEF1AE.B7EDE%dvhart@linux.intel.com>
Date:	Fri, 16 Jan 2015 17:33:05 -0800
From:	Darren Hart <dvhart@...ux.intel.com>
To:	"Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>
CC:	Carlos O'Donell <carlos@...hat.com>, Ingo Molnar <mingo@...e.hu>,
	Jakub Jelinek <jakub@...hat.com>,
	"linux-man@...r.kernel.org" <linux-man@...r.kernel.org>,
	lkml <linux-kernel@...r.kernel.org>,
	Arnd Bergmann <arnd@...db.de>,
	Steven Rostedt <rostedt@...dmis.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Linux API <linux-api@...r.kernel.org>,
	Davidlohr Bueso <dave@...olabs.net>
Subject: Re: futex(2) man page update help request

Corrected Davidlohr's email address.

On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
<mtk.manpages@...il.com> wrote:

>Hello Darren,
>
>I give you the same apology as to Thomas for the
>long-delayed response to your mail.
>
>And I repeat my note to Thomas:
>In the next day or two, I hope to send out the new version
>of the futex(2) page for review. The new draft is a bit
>bigger (okay -- 4 x bigger) than the current page. And there
>are a quite number of FIXMEs that I've placed in the page
>for various points--some minor, but a few major--that need
>to be checked or fixed. Would you have some time to review
>that page?

I'll make the time for that. I've wanted to see this for a while, so thank
you for working on it!

> 
>
>In the meantime, I have a couple of questions, which, if
>you could answer them, I would work some changes into the
>page before sending.
>
>1. In various places, distinction is made between non-PI
>   futexs and PI futexes. But what determines that distinction?
>   From the kernel's perspective, hat make a futex one type
>   or another? I presume it is to do with the types of blocking
>   waiters on the futex, but it would be good to have a formal
>   definition.

You're right in that a uaddr is a uaddr is a uaddr. Also "there is no such
thing as a futex", it doesn't exist as any kind of identifiable object, so
these discussions can get rather confusing :-)

A "futex" becomes a PI futex when it is "created" via a PI futex op code.
At that point, the syscall will ensure a pi_state is populated for the
futex_q entry. See futex_lock_pi() for example. Before the locks are
taken, there is a call to refill_pi_state_cache() which preps a pi_state
for assignment later in futex_lock_pi_atomic(). This pi_state provides the
necessary linkage to perform the priority boosting in the event of a
priority inversion. This is handled externally from the futexes via the
rt_mutex construct.

Clear as mud?


>
>2. Can you say something about the pairing requirements of
>   FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
>   What is the requirement and why do we need it?

Briefly, these op codes exist to support a fairly specific use case:
support for PI aware pthread condvars (glibc patch acceptance STILL
PENDING FOR LOVE OF EVERYTHING HOLY WHY?!?!?! But is shipped with various
PREEMPT_RT enabled Linux systems. Because these calls are paired, and more
of the logic can happen on the kernel side (to preserve ownership of an
rt_mutex with waiters), so in order to ensure userspace and kernelspace
remain in sync, we pre-specify the target of the requeue in
futex_wait_requeue_pi. This also limits the attack surface by only
supporting exactly what it was meant to do. The corner cases get insane
otherwise.

We could walk through the various ways in which it would break if these
pairing restrictions were not in place, but I'll have to take some serious
time to page all those into working memory. Let me know if we need more
detail here and I will.

>
>Most of the rest of this mail is just a checklist noting
>what I did with your comments. No response is needed
>in most cases, but there is one that I have marked with
>"???". If you could reply to that. I'd be grateful.

...

>> For all the PI opcodes, we should probably mention something about the
>> futex value scheme (TID), whereas the other opcodes do not require any
>> specific value scheme.
>> 
>> No Owner:	0
>> Owner:		TID
>> Waiters:	TID | FUTEX_WAITERS
>> 
>> This is the relevant section from the referenced paper:
>> 				
>> The PI futex operations diverge from the oth-
>> ers in that they impose a policy describing how
>> the futex value is to be used. If the lock is un-
>> owned, the futex value shall be 0. If owned, it
>> shall be the thread id (tid) of the owning thread.
>> If there are threads contending for the lock, then
>> the FUTEX_WAITERS flag is set. With this policy in
>> place, userspace can atomically acquire an unowned
>> lock or release an uncontended lock using an atomic
>> instruction and their own tid. A non-zero futex
>> value will force waiters into the kernel to lock. The
>> FUTEX_WAITERS flag forces the owner into the kernel
>> to unlock. If the callers are forced into the kernel,
>> they then deal directly with an underlying rt_mutex
>> which implements the priority inheritance semantics.
>> After the rt_mutex is acquired, the futex value is up-
>> dated accordingly, before the calling thread returns
>> to userspace.
>>
>> It is important to note that the kernel will update the futex value
>>prior
>> to returning to userspace. Unlike other futex op codes,
>> FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are
>>designed
>> for the implementation of very specific IPC mechanisms).
>
>??? Great text. May I presume that I can take this text
>and freely adapt it for the man page? (Actually, this is a
>request for forgiveness, rather than permission :-).)

Thanks, and no objection from me.

--
Darren Hart
Intel Open Source Technology Center


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ