linux-kernel - Introduce fences for N:M completion variables

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1466759333-4703-1-git-send-email-chris@chris-wilson.co.uk>
Date:	Fri, 24 Jun 2016 10:08:44 +0100
From:	Chris Wilson <chris@...is-wilson.co.uk>
To:	linux-kernel@...r.kernel.org
Subject: Introduce fences for N:M completion variables

struct completion allows for multiple waiters on a single event.
However, frequently we want to wait on multiple events. For example in
job processing, we need to wait for all prerequisite tasks to complete
before proceeding. Such dependency tracking is common to many situations.
In dma-buf, we already have a mechanism in place for tracking
dependencies between tasks and across drivers, the fence. Each fence is
a fixed point on a timeline that the hardware is processing (though the
hardware may be executing from multiple timelines concurrently). Each
fence may wait on any other fence (and for native fences the wait may be
executed on the device, but otherwise the signaling and forward progress
of the inter-fence serialisation is provided by the drivers themselves).
The added complexity of hardware interaction makes the dma-buf fence
unwieldy as a drop-in extension of struct completion. Enter kfence.

The kfence is intended to be as easy to use as a struct completion in
order to provide barriers in a DAG of tasks. It can provide
serialisation with other software events just as easily as it can mix in
dma-fences and be used to construct an event-driven state machine.

The tasks I have applied kfence to are:

 * providing fine-grained dependency and concurrent execution for the
   global initcalls. Drivers are currently creatively using the fixed
   initcall phases to solve dependency problems. Knowing which initcall
   can be executed in parallel helps speed up the boot process. Though
   not as much as removing the barrier after initramfs!

 * providing fine-grained dependency and concurrent execution for
   load/resume within a module (within the overall global async
   execution). Trying to parallelise a driver between discovery and
   hardware setup is hard to retrofit and will be challenging to
   maintain without a mechanism by which we can describe the dependencies
   of each phase upon each other (and hw state) and then let the
   hardware resolve the order in which to execute the phases. We want a
   declarative syntax?

 * providing asynchronous execution of GPU rendering (for a mix of
   inter-device rendering and inter-engine without hardware scheduling).
   This mixes dma-fences with an event-driven state machine. Here, the
   kfence primarily serves as a collection of dma-fences.

 * providing asynchronous execution of atomic modesetting,
   mixing the current usage of struct completion with dma-fences into
   one consistent framework