lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1aadcb3d-75e2-285c-2244-e472cc21bb97@quicinc.com>
Date: Wed, 14 Aug 2024 12:49:40 -0600
From: Jeffrey Hugo <quic_jhugo@...cinc.com>
To: Lizhi Hou <lizhi.hou@....com>, <ogabbay@...nel.org>,
        <dri-devel@...ts.freedesktop.org>
CC: <linux-kernel@...r.kernel.org>, <min.ma@....com>, <max.zhen@....com>,
        <sonal.santan@....com>, <king.tam@....com>
Subject: Re: [PATCH V2 00/10] AMD XDNA driver

On 8/12/2024 12:16 PM, Lizhi Hou wrote:
> 
> On 8/9/24 08:21, Jeffrey Hugo wrote:
>> On 8/5/2024 11:39 AM, Lizhi Hou wrote:
>>> This patchset introduces a new Linux Kernel Driver, amdxdna for AMD 
>>> NPUs.
>>> The driver is based on Linux accel subsystem.
>>>
>>> NPU (Neural Processing Unit) is an AI inference accelerator integrated
>>> into AMD client CPUs. NPU enables efficient execution of Machine 
>>> Learning
>>> applications like CNNs, LLMs, etc.  NPU is based on AMD XDNA
>>> architecture [1].
>>>
>>> AMD NPU consists of the following components:
>>>
>>>    - Tiled array of AMD AI Engine processors.
>>>    - Micro Controller which runs the NPU Firmware responsible for
>>>      command processing, AIE array configuration, and execution 
>>> management.
>>>    - PCI EP for host control of the NPU device.
>>>    - Interconnect for connecting the NPU components together.
>>>    - SRAM for use by the NPU Firmware.
>>>    - Address translation hardware for protected host memory access by 
>>> the
>>>      NPU.
>>>
>>> NPU supports multiple concurrent fully isolated contexts. Concurrent
>>> contexts may be bound to AI Engine array spatially and or temporarily.
>>>
>>> The driver is licensed under GPL-2.0 except for UAPI header which is
>>> licensed GPL-2.0 WITH Linux-syscall-note.
>>>
>>> User mode driver stack consists of XRT [2] and AMD AIE Plugin for 
>>> IREE [3].
>>
>> Is there a special branch with the code?  I don't see any of the uAPI 
>> in either project when searching for the ioctl codes or ioctl structures.
> 
> Please see git repo: https://github.com/amd/xdna-driver
> 
> This contains the out tree driver and shim code which interact with 
> driver. E.g.
> 
> https://github.com/amd/xdna-driver/blob/main/src/shim/bo.cpp#L18

Ok, I need to have a look at this.  Long term is the plan to move the 
shim to the XRT repo once the driver is merged upstream?

> 
>>
>>>
>>> The firmware for the NPU is distributed as a closed source binary, 
>>> and has
>>> already been pushed to the DRM firmware repository [4].
>>>
>>> [1] https://www.amd.com/en/technologies/xdna.html
>>> [2] https://github.com/Xilinx/XRT
>>> [3] https://github.com/nod-ai/iree-amd-aie
>>> [4] 
>>> https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu 
>>>
>>>
>>> Changes since v1:
>>> - Remove some inline defines
>>> - Minor changes based code review comments
>>>
>>> Lizhi Hou (10):
>>>    accel/amdxdna: Add a new driver for AMD AI Engine
>>>    accel/amdxdna: Support hardware mailbox
>>>    accel/amdxdna: Add hardware resource solver
>>>    accel/amdxdna: Add hardware context
>>>    accel/amdxdna: Add GEM buffer object management
>>>    accel/amdxdna: Add command execution
>>>    accel/amdxdna: Add suspend and resume
>>>    accel/amdxdna: Add error handling
>>>    accel/amdxdna: Add query functions
>>>    accel/amdxdna: Add firmware debug buffer support
>>>
>>>   MAINTAINERS                                   |   9 +
>>>   drivers/accel/Kconfig                         |   1 +
>>>   drivers/accel/Makefile                        |   1 +
>>>   drivers/accel/amdxdna/Kconfig                 |  15 +
>>>   drivers/accel/amdxdna/Makefile                |  22 +
>>>   drivers/accel/amdxdna/TODO                    |   4 +
>>>   drivers/accel/amdxdna/aie2_ctx.c              | 949 ++++++++++++++++++
>>>   drivers/accel/amdxdna/aie2_error.c            | 349 +++++++
>>>   drivers/accel/amdxdna/aie2_message.c          | 775 ++++++++++++++
>>>   drivers/accel/amdxdna/aie2_msg_priv.h         | 372 +++++++
>>>   drivers/accel/amdxdna/aie2_pci.c              | 756 ++++++++++++++
>>>   drivers/accel/amdxdna/aie2_pci.h              | 264 +++++
>>>   drivers/accel/amdxdna/aie2_psp.c              | 137 +++
>>>   drivers/accel/amdxdna/aie2_smu.c              | 112 +++
>>>   drivers/accel/amdxdna/aie2_solver.c           | 329 ++++++
>>>   drivers/accel/amdxdna/aie2_solver.h           | 156 +++
>>>   drivers/accel/amdxdna/amdxdna_ctx.c           | 597 +++++++++++
>>>   drivers/accel/amdxdna/amdxdna_ctx.h           | 165 +++
>>>   drivers/accel/amdxdna/amdxdna_drm.c           | 172 ++++
>>>   drivers/accel/amdxdna/amdxdna_drm.h           | 114 +++
>>>   drivers/accel/amdxdna/amdxdna_gem.c           | 700 +++++++++++++
>>>   drivers/accel/amdxdna/amdxdna_gem.h           |  73 ++
>>>   drivers/accel/amdxdna/amdxdna_mailbox.c       | 582 +++++++++++
>>>   drivers/accel/amdxdna/amdxdna_mailbox.h       | 124 +++
>>>   .../accel/amdxdna/amdxdna_mailbox_helper.c    |  50 +
>>>   .../accel/amdxdna/amdxdna_mailbox_helper.h    |  43 +
>>>   drivers/accel/amdxdna/amdxdna_pci_drv.c       | 234 +++++
>>>   drivers/accel/amdxdna/amdxdna_pci_drv.h       |  31 +
>>>   drivers/accel/amdxdna/amdxdna_sysfs.c         |  58 ++
>>>   drivers/accel/amdxdna/npu1_regs.c             |  94 ++
>>>   drivers/accel/amdxdna/npu2_regs.c             | 111 ++
>>>   drivers/accel/amdxdna/npu4_regs.c             | 111 ++
>>>   drivers/accel/amdxdna/npu5_regs.c             | 111 ++
>>>   include/trace/events/amdxdna.h                | 101 ++
>>>   include/uapi/drm/amdxdna_accel.h              | 456 +++++++++
>>>   35 files changed, 8178 insertions(+)
>>>   create mode 100644 drivers/accel/amdxdna/Kconfig
>>>   create mode 100644 drivers/accel/amdxdna/Makefile
>>>   create mode 100644 drivers/accel/amdxdna/TODO
>>>   create mode 100644 drivers/accel/amdxdna/aie2_ctx.c
>>>   create mode 100644 drivers/accel/amdxdna/aie2_error.c
>>>   create mode 100644 drivers/accel/amdxdna/aie2_message.c
>>>   create mode 100644 drivers/accel/amdxdna/aie2_msg_priv.h
>>>   create mode 100644 drivers/accel/amdxdna/aie2_pci.c
>>>   create mode 100644 drivers/accel/amdxdna/aie2_pci.h
>>>   create mode 100644 drivers/accel/amdxdna/aie2_psp.c
>>>   create mode 100644 drivers/accel/amdxdna/aie2_smu.c
>>>   create mode 100644 drivers/accel/amdxdna/aie2_solver.c
>>>   create mode 100644 drivers/accel/amdxdna/aie2_solver.h
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_ctx.c
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_ctx.h
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_drm.c
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_drm.h
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_gem.c
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_gem.h
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox.c
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox.h
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox_helper.c
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox_helper.h
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_pci_drv.c
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_pci_drv.h
>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_sysfs.c
>>>   create mode 100644 drivers/accel/amdxdna/npu1_regs.c
>>>   create mode 100644 drivers/accel/amdxdna/npu2_regs.c
>>>   create mode 100644 drivers/accel/amdxdna/npu4_regs.c
>>>   create mode 100644 drivers/accel/amdxdna/npu5_regs.c
>>>   create mode 100644 include/trace/events/amdxdna.h
>>>   create mode 100644 include/uapi/drm/amdxdna_accel.h
>>>
>>
>> No Documentation?
> 
> Is it ok to add a work item to TODO and add documentation in later patches?

I beleive best practice would be to add Documnetation in the same 
patch/series that adds the functionality.  I'm not expecting 
Documentation for items not implemented in this series, however I think 
describing the product/architecture/other high level topics would help 
put the code in context during review.

It does seem like the AMD GPU driver had a lot of documentation, which 
makes the lack of documentation for the AMD Accel driver particularly odd.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ