lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151215211816.GR11972@malice.jf.intel.com>
Date:	Tue, 15 Dec 2015 13:18:16 -0800
From:	Darren Hart <dvhart@...radead.org>
To:	"Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Torvald Riegel <triegel@...hat.com>,
	lkml <linux-kernel@...r.kernel.org>,
	libc-alpha <libc-alpha@...rceware.org>,
	linux-man <linux-man@...r.kernel.org>,
	Carlos O'Donell <carlos@...hat.com>,
	Roland McGrath <roland@...k.frob.com>,
	Davidlohr Bueso <dave@...olabs.net>,
	Jakub Jelinek <jakub@...hat.com>, Ingo Molnar <mingo@...e.hu>,
	bill o gallmeister <bgallmeister@...il.com>,
	bert hubert <bert.hubert@...herlabs.nl>,
	Jan Kiszka <jan.kiszka@...mens.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Arnd Bergmann <arnd@...db.de>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Heinrich Schuchardt <xypron.glpk@....de>,
	Andy Lutomirski <luto@...capital.net>,
	Daniel Wagner <wagi@...om.org>,
	Anton Blanchard <anton@...ba.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Rich Felker <dalias@...c.org>,
	Jonathan Wakely <jwakely@...hat.com>,
	Mike Frysinger <vapier@...too.org>
Subject: Re: futex(3) man page, final draft for pre-release review

On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
> Hello all,
> 
> After much too long a time, the revised futex man page *will*
> go out in the next man pages release (it has been merged
> into master).
> 
> There are various places where the page could still be improved,
> but it is much better (and more than 5 times longer) than the
> existing page.
> 
> The rendered version of the page is shown below, so that people
> can make any final comments/suggestions for improvements
> before the release (but of course I'll also take any
> improvements after release as well). The page source is
> available from the Git repo 
> (http://git.kernel.org/cgit/docs/man-pages/man-pages.git).
> 
> As I mention above, there are various places where the page
> could still be better, so the rendered text below is annotated
> with some FIXMEs, in case anyone wants to address these before
> release.
> 
> Thanks
> 
> Michael

Fantastic! A few comments below.

...

> 
>        When executing a futex operation that requests to block a thread,
>        the kernel will block only if the futex word has the  value  that
>        the  calling  thread  supplied  (as  one  of the arguments of the
>        futex() call) as the expected value of the futex word.  The load‐
>        ing  of the futex word's value, the comparison of that value with
>        the expected value, and the actual blocking  will  happen  atomi‐
> 
> FIXME: for next line, it would be good to have an explanation of
> "totally ordered" somewhere around here.
> 
>        cally  and totally ordered with respect to concurrently executing

Totally ordered with respect futex operations refers to semantics of the
ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
writes. The kernel futex operations are protected by spinlocks, which ensure
that that all operations are serialized with respect to one another.

This is a lot to attempt to define in this document. Perhaps a reference to
linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
perhaps for this manual, "serialized" would be sufficient, with a footnote
regarding "totally ordered" and a pointer to the memory-barrier documentation?

>        futex operations on the same futex word.  Thus, the futex word is
>        used to connect the synchronization in user space with the imple‐
>        mentation of blocking by the kernel.  Analogously  to  an  atomic
>        compare-and-exchange  operation  that  potentially changes shared
>        memory, blocking via a futex is an atomic compare-and-block oper‐
>        ation.

...

>    Futex operations
>        The futex_op argument consists of two parts: a command that spec‐
>        ifies  the  operation to be performed, bit-wise ORed with zero or
>        or more options that modify the behaviour of the operation.   The
>        options that may be included in futex_op are as follows:

...

> 
>        FUTEX_CLOCK_REALTIME (since Linux 2.6.28)
>               This   option   bit   can   be   employed  only  with  the
>               FUTEX_WAIT_BITSET and FUTEX_WAIT_REQUEUE_PI operations.

That caught me by surprise, but it's true. We reject FUTEX_WAIT |
FUTEX_CLOCK_REALTIME, even though FUTEX_WAIT treated as FUTEX_WAIT_BITSET with
val3=FUTEX_BITSET_MATCH_ANY.

Thomas, this looks like an oversight to me - do you recall if we intentionally
disallow FUTEX_CLOCK_REALTIME with FUTEX_WAIT?

>               If this option is set, the kernel  treats  timeout  as  an
>               absolute time based on CLOCK_REALTIME.
> 
>               If  this  option  is not set, the kernel treats timeout as
>               relative time, measured against the CLOCK_MONOTONIC clock.

...

>    Priority-inheritance futexes

...

>        *  If  the lock is owned and there are threads contending for the
>           lock, then the FUTEX_WAITERS bit shall be  set  in  the  futex
>           word's value; in other words, this value is:
> 
>               FUTEX_WAITERS | TID
> 
> 
>           (Note that is invalid for a PI futex word to have no owner and

                      ^ it

>           FUTEX_WAITERS set.)
...

>        FUTEX_TRYLOCK_PI (since Linux 2.6.18)
>               This operation tries to acquire the futex at uaddr.  It is
>               invoked when a user-space atomic acquire did  not  succeed
>               because the futex word was not 0.
> 
> 
> FIXME(Next sentence) The wording "The trylock in kernel" below 
> needs clarification. Suggestions?
> 
>               The trylock in kernel might succeed because the futex word

The lock acquisition might succeed in the kernel because the futex word

>               contains     stale     state     (FUTEX_WAITERS     and/or
>               FUTEX_OWNER_DIED).   This can happen when the owner of the
>               futex died.  User space cannot handle this condition in  a
>               race-free  manner,  but  the  kernel  can  fix this up and
>               acquire the futex.
> 
>               The uaddr2, val, timeout, and val3 arguments are ignored.

...

>    EXAMPLE
> 
> FIXME I think it would be helpful here to say a few more words about
>       the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI.
>       Can someone propose something?

Hrm. It seems pretty straightforward to me. I guess I'm too close to it. What
about it seems unclear and needs clarification?

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ