lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e1cc8e3-8522-904f-6458-51dc8b212889@quicinc.com>
Date:   Sun, 20 Nov 2022 15:01:59 -0700
From:   Jeffrey Hugo <quic_jhugo@...cinc.com>
To:     Oded Gabbay <ogabbay@...nel.org>, David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
CC:     Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Maxime Ripard <mripard@...nel.org>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        Arnd Bergmann <arnd@...db.de>, <linux-kernel@...r.kernel.org>,
        <dri-devel@...ts.freedesktop.org>,
        Yuji Ishikawa <yuji2.ishikawa@...hiba.co.jp>,
        Jiho Chu <jiho.chu@...sung.com>,
        Daniel Stone <daniel@...ishbar.org>,
        Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Christoph Hellwig <hch@...radead.org>,
        Kevin Hilman <khilman@...libre.com>,
        Jagan Teki <jagan@...rulasolutions.com>,
        John Hubbard <jhubbard@...dia.com>,
        Alex Deucher <alexander.deucher@....com>,
        Jacek Lawrynowicz <jacek.lawrynowicz@...ux.intel.com>,
        Maciej Kwapulinski <maciej.kwapulinski@...ux.intel.com>,
        Christopher Friedt <chrisfriedt@...il.com>
Subject: Re: [PATCH v4 4/4] doc: add documentation for accel subsystem

On 11/19/2022 1:44 PM, Oded Gabbay wrote:
> Add an introduction section for the accel subsystem. Most of the
> relevant data is in the DRM documentation, so the introduction only
> presents the why of the new subsystem, how are the compute accelerators
> exposed to user-space and what changes need to be done in a standard
> DRM driver to register it to the new accel subsystem.
> 
> Signed-off-by: Oded Gabbay <ogabbay@...nel.org>
> ---
>   Documentation/accel/index.rst        |  17 +++++
>   Documentation/accel/introduction.rst | 109 +++++++++++++++++++++++++++
>   Documentation/subsystem-apis.rst     |   1 +
>   MAINTAINERS                          |   1 +
>   4 files changed, 128 insertions(+)
>   create mode 100644 Documentation/accel/index.rst
>   create mode 100644 Documentation/accel/introduction.rst
> 
> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
> new file mode 100644
> index 000000000000..2b43c9a7f67b
> --- /dev/null
> +++ b/Documentation/accel/index.rst
> @@ -0,0 +1,17 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +====================
> +Compute Accelerators
> +====================
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   introduction
> +
> +.. only::  subproject and html
> +
> +   Indices
> +   =======
> +
> +   * :ref:`genindex`
> diff --git a/Documentation/accel/introduction.rst b/Documentation/accel/introduction.rst
> new file mode 100644
> index 000000000000..5a3963eae973
> --- /dev/null
> +++ b/Documentation/accel/introduction.rst
> @@ -0,0 +1,109 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +============
> +Introduction
> +============
> +
> +The Linux compute accelerators subsystem is designed to expose compute
> +accelerators in a common way to user-space and provide a common set of
> +functionality.
> +
> +These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
> +Although these devices are typically designed to accelerate Machine-Learning
> +and/or Deep-Learning computations, the accel layer is not limited to handling

You use "DL" later on as a short form for Deep-Learning.  It would be 
good to introduce that here.

> +these types of accelerators.
> +
> +typically, a compute accelerator will belong to one of the following

Typically

> +categories:
> +
> +- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
> +  or an IP inside a SoC (e.g. laptop web camera). These devices
> +  are typically configured using registers and can work with or without DMA.
> +
> +- Inference data-center - single/multi user devices in a large server. This
> +  type of device can be stand-alone or an IP inside a SoC or a GPU. It will
> +  have on-board DRAM (to hold the DL topology), DMA engines and
> +  command submission queues (either kernel or user-space queues).
> +  It might also have an MMU to manage multiple users and might also enable
> +  virtualization (SR-IOV) to support multiple VMs on the same device. In
> +  addition, these devices will usually have some tools, such as profiler and
> +  debugger.
> +
> +- Training data-center - Similar to Inference data-center cards, but typically
> +  have more computational power and memory b/w (e.g. HBM) and will likely have
> +  a method of scaling-up/out, i.e. connecting to other training cards inside
> +  the server or in other servers, respectively.
> +
> +All these devices typically have different runtime user-space software stacks,
> +that are tailored-made to their h/w. In addition, they will also probably
> +include a compiler to generate programs to their custom-made computational
> +engines. Typically, the common layer in user-space will be the DL frameworks,
> +such as PyTorch and TensorFlow.
> +
> +Sharing code with DRM
> +=====================
> +
> +Because this type of devices can be an IP inside GPUs or have similar
> +characteristics as those of GPUs, the accel subsystem will use the
> +DRM subsystem's code and functionality. i.e. the accel core code will
> +be part of the DRM subsystem and an accel device will be a new type of DRM
> +device.
> +
> +This will allow us to leverage the extensive DRM code-base and
> +collaborate with DRM developers that have experience with this type of
> +devices. In addition, new features that will be added for the accelerator
> +drivers can be of use to GPU drivers as well.
> +
> +Differentiation from GPUs
> +=========================
> +
> +Because we want to prevent the extensive user-space graphic software stack
> +from trying to use an accelerator as a GPU, the compute accelerators will be
> +differentiated from GPUs by using a new major number and new device char files.
> +
> +Furthermore, the drivers will be located in a separate place in the kernel
> +tree - drivers/accel/.
> +
> +The accelerator devices will be exposed to the user space with the dedicated
> +261 major number and will have the following convention:
> +
> +- device char files - /dev/accel/accel*
> +- sysfs             - /sys/class/accel/accel*/
> +- debugfs           - /sys/kernel/debug/accel/accel*/
> +
> +Getting Started
> +===============
> +
> +First, read the DRM documentation. Not only it will explain how to write a new

How about a link to the DRM documentation?

> +DRM driver but it will also contain all the information on how to contribute,
> +the Code Of Conduct and what is the coding style/documentation. All of that
> +is the same for the accel subsystem.
> +
> +Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.
> +
> +To expose your device as an accelerator, two changes are needed to
> +be done in your driver (as opposed to a standard DRM driver):
> +
> +- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
> +  driver_features field. It is important to note that this driver feature is
> +  mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want

I don't remember seeing code that validates a driver with 
DRIVER_COMPUTE_ACCEL does not also have DRIVER_MODESET.  What am I missing?

> +  to expose both graphics and compute device char files should be handled by
> +  two drivers that are connected using the auxiliary bus framework.
> +
> +- Change the open callback in your driver fops structure to accel_open().
> +  Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
> +  set the correct function operations pointers structure.
> +
> +External References
> +===================
> +
> +email threads
> +-------------
> +
> +* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
> +* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)
> +
> +Conference talks
> +----------------
> +
> +* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022)
> diff --git a/Documentation/subsystem-apis.rst b/Documentation/subsystem-apis.rst
> index af65004a80aa..b51f38527e14 100644
> --- a/Documentation/subsystem-apis.rst
> +++ b/Documentation/subsystem-apis.rst
> @@ -43,6 +43,7 @@ needed).
>      input/index
>      hwmon/index
>      gpu/index
> +   accel/index
>      security/index
>      sound/index
>      crypto/index
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4d752aac3ec0..6ba7bb35208a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6837,6 +6837,7 @@ L:	dri-devel@...ts.freedesktop.org
>   S:	Maintained
>   C:	irc://irc.oftc.net/dri-devel
>   T:	git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git
> +F:	Documentation/accel/
>   F:	drivers/accel/
>   
>   DRM DRIVERS FOR ALLWINNER A10

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ