linux-kernel - Re: [PATCH v4] doc: brief user documentation for completion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150219132933.GA11636@opentech.at>
Date:	Thu, 19 Feb 2015 14:29:33 +0100
From:	Nicholas Mc Guire <der.herr@...r.at>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Jonathan Corbet <corbet@....net>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>, Valdis.Kletnieks@...edu,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] doc: brief user documentation for completion

On Wed, 18 Feb 2015, Ingo Molnar wrote:

> 
> * Nicholas Mc Guire <der.herr@...r.at> wrote:
> 
> > Signed-off-by: Nicholas Mc Guire <der.herr@...r.at>
> > ---
> > 
> > v3: cleanups and merged review notes from Jonathan Corbet <corbet@....net>
> > v4: english cleanup and an important ommission on scope from
> >     Valdis Kletniek <Valdis.Kletnieks@...edu>
> > 
> >  Documentation/scheduler/completion.txt |  245 ++++++++++++++++++++++++++++++++
> >  1 file changed, 245 insertions(+)
> >  create mode 100644 Documentation/scheduler/completion.txt
> > 
> > diff --git a/Documentation/scheduler/completion.txt b/Documentation/scheduler/completion.txt
> > new file mode 100644
> > index 0000000..72cbda2
> > --- /dev/null
> > +++ b/Documentation/scheduler/completion.txt
> > @@ -0,0 +1,245 @@
> > +completions - wait for completion handling
> > +==========================================
> > +
> > +This document was originally written based on 3.18.0 (linux-next)
> > +
> > +Introduction:
> > +-------------
> > +
> > +If you have one or more threads of execution that must wait for some process
> > +to have reached a point or a specific state, completions can provide a race
> > +free solution to this problem. Semantically they are somewhat like a
> > +pthread_barriers and have similar use-cases.
> > +
> > +Completions are a code synchronization mechanism that is preferable to any
> 
> s/that is/which are ?
> 
> > +misuse of locks. Any time you think of using yield() or some quirky
> > +msleep(1); loop to allow something else to proceed, you probably want to
> > +look into using one of the wait_for_completion*() calls instead. The
> > +advantage of using completions is clear intent of the code but also more
> 
> s/code but/code, but
> 
> > +efficient code as both threads can continue until the result is actually
> > +needed.
> > +
> > +Completions are built on top of the generic event infrastructure in Linux,
> > +with the event reduced to a simple flag appropriately called "done" in
> > +struct completion, that tells the waiting threads of execution if they
> > +can continue safely.
> > +
> > +As completions are scheduling related the code is found in
> 
> s/related the/related, the
> 
> > +kernel/sched/completion.c - for details on completion design and
> > +implementation see completions-design.txt
> > +
> > +
> > +Usage:
> > +------
> > +
> > +There are three parts to the using completions, the initialization of the
> 
> s/to the using/to using
> 
> > +struct completion, the waiting part through a call to one of the variants of
> > +wait_for_completion() and the signaling side through a call to complete(),
> > +or complete_all(). Further there are some helper functions for checking the
> > +state of completions.
> > +
> > +To use completions one needs to include <linux/completion.h> and
> > +create a variable of type struct completion. The structure used for
> > +handling of completions is:
> > +
> > +	struct completion {
> > +		unsigned int done;
> > +		wait_queue_head_t wait;
> > +	};
> > +
> > +providing the wait queue to place tasks on for waiting and the flag for
> > +indicating the state of affairs.
> > +
> > +Completions should be named to convey the intent of the waiter.  A good
> > +example is:
> > +
> > +	wait_for_completion(&early_console_added);
> > +
> > +	complete(&early_console_added);
> > +
> > +Good naming (as always) helps code readability.
> > +
> > +
> > +Initializing completions:
> > +-------------------------
> > +
> > +Initialization of dynamically allocated completions, often embedded in
> > +other structures, is done with:
> > +
> > +	void init_completion(&done);
> > +
> > +Initialization is accomplished by initializing the wait queue and setting
> > +the default state to "not available", that is, "done" is set to 0.
> > +
> > +The re-initialization function, reinit_completion(), simply resets the
> > +done element to "not available", thus again to 0, without touching the
> > +wait queue. Calling init_completion() on the same completions object is
> 
> s/completions object/completion object
> 
> > +most likely a bug as it re-initializes the queue to an empty queue and
> > +enqueued tasks could get "lost" - use reinit_completion() in that case.
> > +
> > +For static declaration and initialization, macros are available. These are:
> > +
> > +	static DECLARE_COMPLETION(setup_done)
> > +
> > +used for static declarations in file scope. Within functions the static
> > +initialization should always use:
> > +
> > +	DECLARE_COMPLETION_ONSTACK(setup_done)
> > +
> > +suitable for automatic/local variables on the stack and will make lockdep
> > +happy. Note also that one needs to making *sure* the completion passt to
> 
> s/needs to making *sure*/needs to make *sure*/
> s/passt/passed ?
> 
> > +work threads remains in-scope, and no references remain to on-stack data
> > +when the initiating function returns.
> > +
> > +
> > +Waiting for completions:
> > +------------------------
> > +
> > +For a thread of execution to wait for some concurrent work to finish, it
> > +calls wait_for_completion() on the initialized completion structure.
> > +A typical usage scenario is:
> > +
> > +
> > +	structure completion setup_done;
> > +	init_completion(&setup_done);
> > +	initialze_work(...,&setup_done,...)
> > +
> > +	/* run non-dependent code */              /* do setup */
> > +
> > +	wait_for_completion(&seupt_done);         complete(setup_done)
> > +
> > +This is not implying any temporal order of wait_for_completion() and the
> 
> this does not imply any temporar order on?
> 
> > +call to complete() - if the call to complete() happened before the call
> > +to wait_for_completion() then the waiting side simply will continue
> > +immediately as all dependencies are satisfied.
> > +
> > +Note that wait_for_completion() is calling spin_lock_irq/spin_unlock_irq
> > +so it can only be called safely when you know that interrupts are enabled.
> > +Calling it from hard-irq context will result in hard to detect spurious
> > +enabling of interrupts.
> 
> It's not just about hardirq contexts, but also about 
> irqs-off atomic contexts.
>

this is what I ment with "when you know that interrupts are enabled" added in
the irqs-off atomic contexts explicitly.
 
> > +
> > +
> > +wait_for_completion():
> > +
> > +	void wait_for_completion(struct completion *done):
> > +
> > +The default behavior is to wait without a timeout and mark the task as
> > +uninterruptible. wait_for_completion() and its variants are only safe
> > +in soft-interrupt or process context but not in hard-irq context.
> 
> I don't think wait_for_completion() is safe in softirq 
> context ...
>
> > +As all variants of wait_for_completion() can (obviously) block for a long
> > +time, you probably don't want to call this with held locks - see also
> > +try_wait_for_completion() below.
> 
> Not 'probably': you must not call it from atomic contexts, 
> held spinlocks, elevated preempt count, disabled irqs, etc.
>
> > +
> > +
> > +Variants available:
> > +-------------------
> > +
> > +The below variants all return status and this status should be checked in
> > +most(/all) cases - in cases where the status is deliberately not checked you
> > +probably want to make a note explaining this (e.g. see
> > +arch/arm/kernel/smp.c:__cpu_up()).
> > +
> > +A common problem that occurs is to have unclean assignment of return types,
> > +so care should be taken with assigning return-values to variables of proper
> > +type. Checking for the specific meaning of return values also has been found
> > +to be quite inaccurate e.g. constructs like
> > +if(!wait_for_completion_interruptible_timeout(...)) would execute the same
> 
> s/if(/if (/
> 
> > +code path for successful completion and for the interrupted case - which is
> > +probably not what you want.
> > +
> > +
> > +	int wait_for_completion_interruptible(struct completion *done)
> > +
> > +marking the task TASK_INTERRUPTIBLE. If a signal was received while waiting.
> > +It will return -ERESTARTSYS and 0 otherwise.
> 
> s/marking/This function marks
> 
> The two final sentences should probably be one?
> 
> > +
> > +
> > +	unsigned long wait_for_completion_timeout(struct completion *done,
> > +		unsigned long timeout)
> > +
> > +The task is marked as TASK_UNINTERRUPTIBLE and will wait at most timeout
> 
> s/timeout/'timeout'
> 
> > +(in jiffies). If timeout occurs it return 0 else the remaining time in
> 
> s/it return 0/it returns 0,
> 
> > +jiffies (but at least 1). Timeouts are preferably passed by msecs_to_jiffies()
> > +or usecs_to_jiffies(). If the returned timeout value is deliberately ignored
> > +a comment should probably explain why (e.g. see drivers/mfd/wm8350-core.c
> > +wm8350_read_auxadc())
> > +
> > +
> > +	long wait_for_completion_interruptible_timeout(
> > +		struct completion *done, unsigned long timeout)
> > +
> > +passing a timeout in jiffies and marking the task as TASK_INTERRUPTIBLE. If a
> 
> s/passing/This function passes
> 
> > +signal was received it will return -ERESTARTSYS, 0 if completion timed-out and
> 
> s/timed-out/timed out
> 
> > +the remaining time in jiffies if completion occurred.
> > +
> > +Further variants include _killable which passes TASK_KILLABLE as the
> > +designated tasks state and will return a -ERESTARTSYS if interrupted or
> 
> s/return a -ERESTARTSYS/return -ERESTARTSYS
> 
> > +else 0 if completions was achieved as well as a _timeout variant.
> 
> s/if completions was achieved/if completion was achieved/
> 
> > +
> > +	long wait_for_completion_killable(struct completion *done)
> > +	long wait_for_completion_killable_timeout(struct completion *done,
> > +		unsigned long timeout)
> > +
> > +
> > +The _io variants wait_for_completion_io behave the same as the non-_io
> 
> s/wait_for_completion_io/wait_for_completion_io()
> 
> > +variants, except for accounting waiting time as waiting on IO, which has
> > +an impact on how scheduling is calculated.
> 
> s/how scheduling is calculated/how the task is accounted in scheduling stats
> 
> > +
> > +	void wait_for_completion_io(struct completion *done)
> > +	unsigned long wait_for_completion_io_timeout(struct completion *done
> > +		unsigned long timeout)
> > +
> > +
> > +Signaling completions:
> > +----------------------
> > +
> > +A thread of execution that wants to signal that the conditions for
> 
> s/thread of execution/thread
> 
> > +continuation have been achieved calls complete() to signal exactly one
> > +of the waiters that it can continue.
> > +
> > +	void complete(struct completion *done)
> > +
> > +or calls complete_all to signal all current and future waiters.
> 
> s/complete_all/complete_all()
> 
> > +
> > +	void complete_all(struct completion *done)
> > +
> > +
> > +The signaling will work as expected even if completions are signaled before
> > +a thread starts waiting. This is achieved by the waiter "consuming"
> > +(decrementing) the done element of struct completion. Waiting threads
> > +wakeup order is the same in which they were enqueued (FIFO order).
> > +
> > +If complete() is called multiple times then this will allow for that number
> > +of waiters to continue - each call to complete() will simply increment the
> > +done element. Calling complete_all() multiple times is a bug though. Both
> > +complete() and complete_all() can be called in hard-irq context safely.
> > +
> > +There only can be one thread calling complete() or complete_all() on a
> > +particular struct completions at any time - serialized through the wait
> 
> s/struct completions/struct completion
> 
> > +queue spinlock. Any such concurrent calls to complete() or complete_all()
> > +probably are a design bug.
> > +
> > +Signaling completion from hard-irq context is fine as it will appropriately
> > +lock with spin_lock_irqsave/spin_unlock_irqrestore.
> > +
> > +
> > +try_wait_for_completion()/completion_done():
> > +--------------------------------------------
> > +
> > +The try_wait_for_completion will not put the thread on the wait queue but
> 
> s/The try_wait_for_completion will/
>   The try_wait_for_completion() function will/
> 
> > +rather returns false if it would need to enqueue (block) the thread, else it
> 
> s/if it would need to enqueue/if it needed to enqueue/ ?
> 
> > +consumes any posted completions and returns true.
> > +
> > +     bool try_wait_for_completion(struct completion *done)
> > +
> > +
> > +Finally to check state of a completions without changing it in any way is
> 
> s/completions/completion
> 
> > +provided by completion_done() returning false if there are any posted
> 
> s/if there are any/if there is any
> 
> > +completion that was not yet consumed by waiters implying that there are
> > +waiters and true otherwise;
> > +
> > +     bool completion_done(struct completion *done)
> > +
> > +Both try_wait_for_completion() and completion_done() are safe to be called in
> > +hard-irq context.
> 
> irq atomic context != hard-irq context.
> 
> This needs to be fixed throughout the document.
>
thanks for the detailed review notes

will give it another run regarding locking and context - I obviously need a 
rerun regarding context and locking my self.

And thanks for the language notes - atleast those are clear.

thx!
hofrat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/