linux-kernel - Re: [PATCH v2] mutex: Documentation rewrite

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <537E2F63.1080704@infradead.org>
Date:	Thu, 22 May 2014 10:09:55 -0700
From:	Randy Dunlap <rdunlap@...radead.org>
To:	Davidlohr Bueso <davidlohr@...com>, mingo@...nel.org,
	peterz@...radead.org, tglx@...utronix.de
CC:	akpm@...ux-foundation.org, tim.c.chen@...ux.intel.com,
	paulmck@...ux.vnet.ibm.com, hpa@...or.com, waiman.long@...com,
	jason.low2@...com, aswin@...com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mutex: Documentation rewrite

On 05/22/2014 09:41 AM, Davidlohr Bueso wrote:
> From: Davidlohr Bueso <davidlohr@...com>
> 
> Our mutexes have gone a long ways since the original implementation
> back in 2005/2006. However, the mutex-design.txt document is still
> stuck in the past, to the point where most of the information there
> is practically useless and, more important, simply incorrect. This
> patch pretty much rewrites it to resemble what we have nowadays.
> 
> Since regular semaphores are almost much extinct in the kernel
> (most users now rely on mutexes or rwsems), it no longer makes
> sense to have such a close comparison, which was copied from most
> of the cover letter when Ingo introduced the generic mutex subsystem.
> 
> Note that ww_mutexes are intentionally left out, leaving things as
> generic as possible.
> 
> Signed-off-by: Davidlohr Bueso <davidlohr@...com>
> ---
> Changes from v2:
>  - Grammar corrections.
>  - Document cancelable MCS properties.
> 
>  Documentation/mutex-design.txt | 252 ++++++++++++++++++++++-------------------
>  1 file changed, 135 insertions(+), 117 deletions(-)
> 
> diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt
> index 1dfe62c..d9b0be5 100644
> --- a/Documentation/mutex-design.txt
> +++ b/Documentation/mutex-design.txt
> @@ -1,139 +1,157 @@
>  Generic Mutex Subsystem
>  
>  started by Ingo Molnar <mingo@...hat.com>
> +updated by Davidlohr Bueso <davidlohr@...com>
>  
> -  "Why on earth do we need a new mutex subsystem, and what's wrong
> -   with semaphores?"
> +What are mutexes?
> +-----------------
>  
> -firstly, there's nothing wrong with semaphores. But if the simpler
> -mutex semantics are sufficient for your code, then there are a couple
> -of advantages of mutexes:
> +In the Linux kernel, mutexes refer to a particular locking primitive
> +that enforces serialization on shared memory systems, and not only to
> +the generic term referring to 'mutual exclusion' found in academia
> +or similar theoretical text books. Mutexes are sleeping locks which
> +behave similarly to binary semaphores, and were introduced in 2006[1]
> +as an alternative to these. This new data structure provided a number
> +of advantages, including simpler interfaces, and at that time smaller
> +code (see Disadvantages).
>  
> - - 'struct mutex' is smaller on most architectures: E.g. on x86,
> -   'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
> -   A smaller structure size means less RAM footprint, and better
> -   CPU-cache utilization.
> +[1] http://lwn.net/Articles/164802/
>  
> - - tighter code. On x86 i get the following .text sizes when
> -   switching all mutex-alike semaphores in the kernel to the mutex
> -   subsystem:
> +Implementation
> +--------------
>  
> -        text    data     bss     dec     hex filename
> -     3280380  868188  396860 4545428  455b94 vmlinux-semaphore
> -     3255329  865296  396732 4517357  44eded vmlinux-mutex
> +Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
> +and implemented in kernel/locking/mutex.c. These locks use a three
> +state atomic counter (->count) to represent the different possible
> +transitions that can occur during the lifetime of a lock:
>  
> -   that's 25051 bytes of code saved, or a 0.76% win - off the hottest
> -   codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
> -   Smaller code means better icache footprint, which is one of the
> -   major optimization goals in the Linux kernel currently.
> +	  1: unlocked
> +	  0: locked, no waiters
> +   negative: locked, with potential waiters
>  
> - - the mutex subsystem is slightly faster and has better scalability for
> -   contended workloads. On an 8-way x86 system, running a mutex-based
> -   kernel and testing creat+unlink+close (of separate, per-task files)
> -   in /tmp with 16 parallel tasks, the average number of ops/sec is:
> +In its most basic form it also includes a wait-queue and a spinlock
> +that serializes access to it. CONFIG_SMP systems can also include
> +a pointer to the lock task owner (->owner) as well as a spinner MCS
> +lock (->osq), both described below in (ii).
>  
> -    Semaphores:                        Mutexes:
> +When acquiring a mutex, there are three possible paths that can be
> +taken, depending on the state of the lock:
>  
> -    $ ./test-mutex V 16 10             $ ./test-mutex V 16 10
> -    8 CPUs, running 16 tasks.          8 CPUs, running 16 tasks.
> -    checking VFS performance.          checking VFS performance.
> -    avg loops/sec:      34713          avg loops/sec:      84153
> -    CPU utilization:    63%            CPU utilization:    22%
> +(i) fastpath: tries to atomically acquire the lock by decrementing the
> +    counter. If it was already taken by another task it goes to the next
> +    possible path. This logic is architecture specific. On x86-64, the
> +    locking fastpath is 2 instructions:
>  
> -   i.e. in this workload, the mutex based kernel was 2.4 times faster
> -   than the semaphore based kernel, _and_ it also had 2.8 times less CPU
> -   utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
> -   performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
> -   performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
> -   more efficient.)
> -
> -   the scalability difference is visible even on a 2-way P4 HT box:
> -
> -    Semaphores:                        Mutexes:
> -
> -    $ ./test-mutex V 16 10             $ ./test-mutex V 16 10
> -    4 CPUs, running 16 tasks.          8 CPUs, running 16 tasks.
> -    checking VFS performance.          checking VFS performance.
> -    avg loops/sec:      127659         avg loops/sec:      181082
> -    CPU utilization:    100%           CPU utilization:    34%
> -
> -   (the straight performance advantage of mutexes is 41%, the per-cycle
> -    efficiency of mutexes is 4.1 times better.)
> -
> - - there are no fastpath tradeoffs, the mutex fastpath is just as tight
> -   as the semaphore fastpath. On x86, the locking fastpath is 2
> -   instructions:
> -
> -    c0377ccb <mutex_lock>:
> -    c0377ccb:       f0 ff 08                lock decl (%eax)
> -    c0377cce:       78 0e                   js     c0377cde <.text..lock.mutex>
> -    c0377cd0:       c3                      ret
> +    0000000000000e10 <mutex_lock>:
> +    e21:   f0 ff 0b                lock decl (%rbx)
> +    e24:   79 08                   jns    e2e <mutex_lock+0x1e>
>  
>     the unlocking fastpath is equally tight:
>  
> -    c0377cd1 <mutex_unlock>:
> -    c0377cd1:       f0 ff 00                lock incl (%eax)
> -    c0377cd4:       7e 0f                   jle    c0377ce5 <.text..lock.mutex+0x7>
> -    c0377cd6:       c3                      ret
> -
> - - 'struct mutex' semantics are well-defined and are enforced if
> -   CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
> -   virtually no debugging code or instrumentation. The mutex subsystem
> -   checks and enforces the following rules:
> -
> -   * - only one task can hold the mutex at a time
> -   * - only the owner can unlock the mutex
> -   * - multiple unlocks are not permitted
> -   * - recursive locking is not permitted
> -   * - a mutex object must be initialized via the API
> -   * - a mutex object must not be initialized via memset or copying
> -   * - task may not exit with mutex held
> -   * - memory areas where held locks reside must not be freed
> -   * - held mutexes must not be reinitialized
> -   * - mutexes may not be used in hardware or software interrupt
> -   *   contexts such as tasklets and timers
> -
> -   furthermore, there are also convenience features in the debugging
> -   code:
> -
> -   * - uses symbolic names of mutexes, whenever they are printed in debug output
> -   * - point-of-acquire tracking, symbolic lookup of function names
> -   * - list of all locks held in the system, printout of them
> -   * - owner tracking
> -   * - detects self-recursing locks and prints out all relevant info
> -   * - detects multi-task circular deadlocks and prints out all affected
> -   *   locks and tasks (and only those tasks)
> +    0000000000000bc0 <mutex_unlock>:
> +    bc8:   f0 ff 07                lock incl (%rdi)
> +    bcb:   7f 0a                   jg     bd7 <mutex_unlock+0x17>
> +
> +
> +(ii) midpath: aka optimistic spinning, tries to spin for acquisition
> +     when there are no pending waiters and the lock owner is currently
> +     running on a different CPU. The rationale is that if the lock owner
> +     is running, it is likely to release the lock soon. The mutex spinners
> +     are queued up using MCS lock so that only one spinner can compete for
> +     the mutex.
> +
> +     The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock
> +     with the desirable properties of being fair and with each cpu trying
> +     to acquire the lock spinning on a local variable. It avoids expensive
> +     cacheline bouncing that common test-and-set spinlock implementations
> +     incur. An MCS-like lock is specially tailored for optimistic spinning
> +     for sleeping lock implementation. An important feature of the customized
> +     MCS lock is that it has the extra property that spinners are able to exit
> +     the MCS spinlock queue when they needs to reschedule. This further helps

	                                 need

> +     avoid situations where MCS spinners that need to reschedule would continue
> +     waiting to spin on mutex owner, only to go directly to slowpath upon
> +     obtaining the MCS lock.
> +
> +
> +(iii) slowpath: last resort, if the lock is still unable to be acquired

	                                                          acquired,

> +      the task is added to the wait-queue and sleeps until it can be taken.
> +      Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE.
> +
> +While formally kernel mutexes are sleepable locks, it is path (ii) that
> +makes them more practically a hybrid type. By simply not interrupting a
> +task and busy-waiting for a few cycles instead of immediately sleeping,
> +the performance of this lock has been seen to significantly improve a
> +number of workloads. Note that this technique is also used for rw-semaphores.
> +
> +Semantics
> +---------
> +
> +The mutex subsystem checks and enforces the following rules:
> +
> +    - Only one task can hold the mutex at a time.
> +    - Only the owner can unlock the mutex.
> +    - Multiple unlocks are not permitted.
> +    - Recursive locking/unlocking is not permitted.
> +    - A mutex must only be initialized via the API (see below).
> +    - A task may not exit with a mutex held.
> +    - Memory areas where held locks reside must not be freed.
> +    - Held mutexes must not be reinitialized.
> +    - Mutexes may not be used in hardware or software interrupt
> +      contexts such as tasklets and timers.
> +
> +These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled.
> +In addition, the mutex debugging code also implements a number of other
> +features that make lock debugging easier and faster:
> +
> +    - Uses symbolic names of mutexes, whenever they are printed
> +      in debug output.
> +    - Point-of-acquire tracking, symbolic lookup of function names

confusing.  Should there be a comma after "function names"?

> +      list of all locks held in the system, printout of them.
> +    - Owner tracking.
> +    - Detects self-recursing locks and prints out all relevant info.
> +    - Detects multi-task circular deadlocks and prints out all affected
> +      locks and tasks (and only those tasks).
> +
> +
> +Interfaces
> +----------
> +Statically define the mutex:
> +   DEFINE_MUTEX(name);
> +
> +Dynamically initialize the mutex:
> +   mutex_init(mutex);
> +
> +Acquire the mutex, uninterruptable:

                                 ible:

> +   void mutex_lock(struct mutex *lock);
> +   void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
> +   int  mutex_trylock(struct mutex *lock);
> +
> +Acquire the mutex, interruptible:
> +   int mutex_lock_interruptible_nested(struct mutex *lock,
> +				       unsigned int subclass);
> +   int mutex_lock_interruptible(struct mutex *lock);
> +
> +Acquire the mutex, interruptible, if dec to 0:
> +   int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
> +
> +Unlock the mutex:
> +   void mutex_unlock(struct mutex *lock);
> +
> +Test if the mutex is taken:
> +   int mutex_is_locked(struct mutex *lock);
>  
>  Disadvantages
>  -------------
>  
> -The stricter mutex API means you cannot use mutexes the same way you
> -can use semaphores: e.g. they cannot be used from an interrupt context,
> -nor can they be unlocked from a different context that which acquired
> -it. [ I'm not aware of any other (e.g. performance) disadvantages from
> -using mutexes at the moment, please let me know if you find any. ]
> -
> -Implementation of mutexes
> --------------------------
> -
> -'struct mutex' is the new mutex type, defined in include/linux/mutex.h and
> -implemented in kernel/locking/mutex.c. It is a counter-based mutex with a
> -spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", 0 for
> -"locked" and negative numbers (usually -1) for "locked, potential waiters
> -queued".
> -
> -the APIs of 'struct mutex' have been streamlined:
> -
> - DEFINE_MUTEX(name);
> +Unlike its original design and purpose, 'struct mutex' is larger than
> +most locks in the kernel. E.g: on x86-64 it is 40 bytes, almost twice
> +as large as 'struct semaphore' (24 bytes) and 8 bytes shy of the
> +'struct rw_semaphore' variant. Larger structure sizes mean more CPU
> +cache and memory footprint.
>  
> - mutex_init(mutex);
> +When to use mutexes
> +-------------------
>  
> - void mutex_lock(struct mutex *lock);
> - int  mutex_lock_interruptible(struct mutex *lock);
> - int  mutex_trylock(struct mutex *lock);
> - void mutex_unlock(struct mutex *lock);
> - int  mutex_is_locked(struct mutex *lock);
> - void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
> - int  mutex_lock_interruptible_nested(struct mutex *lock,
> -                                      unsigned int subclass);
> - int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
> +Unless the strict semantics of mutexes are unsuitable and/or the critical
> +region prevents the lock from being shared, always prefer them to any other
> +locking primitive.
> 


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/