[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151215211816.GR11972@malice.jf.intel.com>
Date: Tue, 15 Dec 2015 13:18:16 -0800
From: Darren Hart <dvhart@...radead.org>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Torvald Riegel <triegel@...hat.com>,
lkml <linux-kernel@...r.kernel.org>,
libc-alpha <libc-alpha@...rceware.org>,
linux-man <linux-man@...r.kernel.org>,
Carlos O'Donell <carlos@...hat.com>,
Roland McGrath <roland@...k.frob.com>,
Davidlohr Bueso <dave@...olabs.net>,
Jakub Jelinek <jakub@...hat.com>, Ingo Molnar <mingo@...e.hu>,
bill o gallmeister <bgallmeister@...il.com>,
bert hubert <bert.hubert@...herlabs.nl>,
Jan Kiszka <jan.kiszka@...mens.com>,
Eric Dumazet <edumazet@...gle.com>,
Arnd Bergmann <arnd@...db.de>,
Rusty Russell <rusty@...tcorp.com.au>,
Heinrich Schuchardt <xypron.glpk@....de>,
Andy Lutomirski <luto@...capital.net>,
Daniel Wagner <wagi@...om.org>,
Anton Blanchard <anton@...ba.org>,
Steven Rostedt <rostedt@...dmis.org>,
Rich Felker <dalias@...c.org>,
Jonathan Wakely <jwakely@...hat.com>,
Mike Frysinger <vapier@...too.org>
Subject: Re: futex(3) man page, final draft for pre-release review
On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
> Hello all,
>
> After much too long a time, the revised futex man page *will*
> go out in the next man pages release (it has been merged
> into master).
>
> There are various places where the page could still be improved,
> but it is much better (and more than 5 times longer) than the
> existing page.
>
> The rendered version of the page is shown below, so that people
> can make any final comments/suggestions for improvements
> before the release (but of course I'll also take any
> improvements after release as well). The page source is
> available from the Git repo
> (http://git.kernel.org/cgit/docs/man-pages/man-pages.git).
>
> As I mention above, there are various places where the page
> could still be better, so the rendered text below is annotated
> with some FIXMEs, in case anyone wants to address these before
> release.
>
> Thanks
>
> Michael
Fantastic! A few comments below.
...
>
> When executing a futex operation that requests to block a thread,
> the kernel will block only if the futex word has the value that
> the calling thread supplied (as one of the arguments of the
> futex() call) as the expected value of the futex word. The load‐
> ing of the futex word's value, the comparison of that value with
> the expected value, and the actual blocking will happen atomi‐
>
> FIXME: for next line, it would be good to have an explanation of
> "totally ordered" somewhere around here.
>
> cally and totally ordered with respect to concurrently executing
Totally ordered with respect futex operations refers to semantics of the
ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
writes. The kernel futex operations are protected by spinlocks, which ensure
that that all operations are serialized with respect to one another.
This is a lot to attempt to define in this document. Perhaps a reference to
linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
perhaps for this manual, "serialized" would be sufficient, with a footnote
regarding "totally ordered" and a pointer to the memory-barrier documentation?
> futex operations on the same futex word. Thus, the futex word is
> used to connect the synchronization in user space with the imple‐
> mentation of blocking by the kernel. Analogously to an atomic
> compare-and-exchange operation that potentially changes shared
> memory, blocking via a futex is an atomic compare-and-block oper‐
> ation.
...
> Futex operations
> The futex_op argument consists of two parts: a command that spec‐
> ifies the operation to be performed, bit-wise ORed with zero or
> or more options that modify the behaviour of the operation. The
> options that may be included in futex_op are as follows:
...
>
> FUTEX_CLOCK_REALTIME (since Linux 2.6.28)
> This option bit can be employed only with the
> FUTEX_WAIT_BITSET and FUTEX_WAIT_REQUEUE_PI operations.
That caught me by surprise, but it's true. We reject FUTEX_WAIT |
FUTEX_CLOCK_REALTIME, even though FUTEX_WAIT treated as FUTEX_WAIT_BITSET with
val3=FUTEX_BITSET_MATCH_ANY.
Thomas, this looks like an oversight to me - do you recall if we intentionally
disallow FUTEX_CLOCK_REALTIME with FUTEX_WAIT?
> If this option is set, the kernel treats timeout as an
> absolute time based on CLOCK_REALTIME.
>
> If this option is not set, the kernel treats timeout as
> relative time, measured against the CLOCK_MONOTONIC clock.
...
> Priority-inheritance futexes
...
> * If the lock is owned and there are threads contending for the
> lock, then the FUTEX_WAITERS bit shall be set in the futex
> word's value; in other words, this value is:
>
> FUTEX_WAITERS | TID
>
>
> (Note that is invalid for a PI futex word to have no owner and
^ it
> FUTEX_WAITERS set.)
...
> FUTEX_TRYLOCK_PI (since Linux 2.6.18)
> This operation tries to acquire the futex at uaddr. It is
> invoked when a user-space atomic acquire did not succeed
> because the futex word was not 0.
>
>
> FIXME(Next sentence) The wording "The trylock in kernel" below
> needs clarification. Suggestions?
>
> The trylock in kernel might succeed because the futex word
The lock acquisition might succeed in the kernel because the futex word
> contains stale state (FUTEX_WAITERS and/or
> FUTEX_OWNER_DIED). This can happen when the owner of the
> futex died. User space cannot handle this condition in a
> race-free manner, but the kernel can fix this up and
> acquire the futex.
>
> The uaddr2, val, timeout, and val3 arguments are ignored.
...
> EXAMPLE
>
> FIXME I think it would be helpful here to say a few more words about
> the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI.
> Can someone propose something?
Hrm. It seems pretty straightforward to me. I guess I'm too close to it. What
about it seems unclear and needs clarification?
--
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists