lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49b76568-572d-b3ef-da79-36f1a3c44f27@quicinc.com>
Date: Fri, 25 Oct 2024 16:02:13 -0600
From: Jeffrey Hugo <quic_jhugo@...cinc.com>
To: Lizhi Hou <lizhi.hou@....com>, <ogabbay@...nel.org>,
        <dri-devel@...ts.freedesktop.org>
CC: <linux-kernel@...r.kernel.org>, <min.ma@....com>, <max.zhen@....com>,
        <sonal.santan@....com>, <king.tam@....com>
Subject: Re: [PATCH V5 00/10] AMD XDNA driver

On 10/25/2024 3:28 PM, Lizhi Hou wrote:
> 
> On 10/25/24 10:55, Jeffrey Hugo wrote:
>> On 10/21/2024 10:19 AM, Lizhi Hou wrote:
>>> This patchset introduces a new Linux Kernel Driver, amdxdna for AMD 
>>> NPUs.
>>> The driver is based on Linux accel subsystem.
>>>
>>> NPU (Neural Processing Unit) is an AI inference accelerator integrated
>>> into AMD client CPUs. NPU enables efficient execution of Machine 
>>> Learning
>>> applications like CNNs, LLMs, etc.  NPU is based on AMD XDNA
>>> architecture [1].
>>>
>>> AMD NPU consists of the following components:
>>>
>>>    - Tiled array of AMD AI Engine processors.
>>>    - Micro Controller which runs the NPU Firmware responsible for
>>>      command processing, AIE array configuration, and execution 
>>> management.
>>>    - PCI EP for host control of the NPU device.
>>>    - Interconnect for connecting the NPU components together.
>>>    - SRAM for use by the NPU Firmware.
>>>    - Address translation hardware for protected host memory access by 
>>> the
>>>      NPU.
>>>
>>> NPU supports multiple concurrent fully isolated contexts. Concurrent
>>> contexts may be bound to AI Engine array spatially and or temporarily.
>>>
>>> The driver is licensed under GPL-2.0 except for UAPI header which is
>>> licensed GPL-2.0 WITH Linux-syscall-note.
>>>
>>> User mode driver stack consists of XRT [2] and AMD AIE Plugin for 
>>> IREE [3].
>>>
>>> The firmware for the NPU is distributed as a closed source binary, 
>>> and has
>>> already been pushed to the DRM firmware repository [4].
>>>
>>> [1]https://www.amd.com/en/technologies/xdna.html
>>> [2]https://github.com/Xilinx/XRT
>>> [3]https://github.com/nod-ai/iree-amd-aie
>>> [4]https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu 
>>>
>>>
>>> Changes since v4:
>>> - Fix lockdep errors
>>> - Use __u* structure for struct aie_error
>>
>> One nit, when you send the next version would you please either To: or 
>> Cc: me on the entire series?  I only get pieces in my inbox which is 
>> mildly annoying on my end.
> Sure.
>>
>> Looks like we are getting close here.  One procedural question I have, 
>> do you have commit permissions to drm-misc?
> No, I do not have commit permissions yet.

You should apply for access.  Assuming this series is ready before that 
goes through, I'll apply it.

>> I applied the series to drm-misc-next and tried to build.  Got the 
>> following errors -
> 
> Could you share the build command line? So I can reproduce and verify my 
> fix.

The command is simple:
make -j20

The system details, incase it somehow matters:
Ubuntu 22.04 w/ 5.15 kernel

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

$ uname -a
Linux jhugo-lnx 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux

The kernel config is probably the relevant piece.  When I first built 
after applying the series, I was asked to choose what to do with the new 
config item.  I selected =m.
.config can be found at 
https://gist.github.com/quic-jhugo/4cc249b1e3ba127039fbc709a513a432

> 
> I used "make M=drivers/accel/amdxdna" and did not reproduce the error 
> with drm-misc-next. It looks build robot did not complain with the patch 
> neither.
> 
> $ git branch
> * drm-misc-next
> $ make M=drivers/accel/amdxdna
>    CC [M]  drivers/accel/amdxdna/aie2_ctx.o
>    CC [M]  drivers/accel/amdxdna/aie2_error.o
>    CC [M]  drivers/accel/amdxdna/aie2_message.o
>    CC [M]  drivers/accel/amdxdna/aie2_pci.o
>    CC [M]  drivers/accel/amdxdna/aie2_psp.o
>    CC [M]  drivers/accel/amdxdna/aie2_smu.o
>    CC [M]  drivers/accel/amdxdna/aie2_solver.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
>    CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
>    CC [M]  drivers/accel/amdxdna/npu1_regs.o
>    CC [M]  drivers/accel/amdxdna/npu2_regs.o
>    CC [M]  drivers/accel/amdxdna/npu4_regs.o
>    CC [M]  drivers/accel/amdxdna/npu5_regs.o
>    LD [M]  drivers/accel/amdxdna/amdxdna.o
>    MODPOST drivers/accel/amdxdna/Module.symvers
>    CC [M]  drivers/accel/amdxdna/amdxdna.mod.o
>    CC [M]  drivers/accel/amdxdna/.module-common.o
>    LD [M]  drivers/accel/amdxdna/amdxdna.ko
> $
> 
>>
>>   CC [M]  drivers/accel/amdxdna/aie2_ctx.o
>>   CC [M]  drivers/accel/amdxdna/aie2_error.o
>>   CC [M]  drivers/accel/amdxdna/aie2_message.o
>>   CC [M]  drivers/accel/amdxdna/aie2_pci.o
>>   CC [M]  drivers/accel/amdxdna/aie2_psp.o
>>   CC [M]  drivers/accel/amdxdna/aie2_smu.o
>>   CC [M]  drivers/accel/amdxdna/aie2_solver.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
>>   CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
>>   CC [M]  drivers/accel/amdxdna/npu1_regs.o
>>   CC [M]  drivers/accel/amdxdna/npu2_regs.o
>>   CC [M]  drivers/accel/amdxdna/npu4_regs.o
>>   CC [M]  drivers/accel/amdxdna/npu5_regs.o
>>   AR      drivers/base/firmware_loader/built-in.a
>>   AR      drivers/base/built-in.a
>> In file included from drivers/accel/amdxdna/aie2_message.c:19:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> In file included from drivers/accel/amdxdna/amdxdna_gem.c:15:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> In file included from drivers/accel/amdxdna/aie2_psp.c:11:
>> drivers/accel/amdxdna/aie2_psp.c: In function ‘psp_exec’:
>> drivers/accel/amdxdna/aie2_psp.c:62:34: error: implicit declaration of 
>> function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
>>    62 | FIELD_GET(PSP_STATUS_READY, ready),
>>       |                                  ^~~~~~~~~
>> ./include/linux/iopoll.h:47:21: note: in definition of macro 
>> ‘read_poll_timeout’
>>    47 |                 if (cond) \
>>       |                     ^~~~
>> drivers/accel/amdxdna/aie2_psp.c:61:15: note: in expansion of macro 
>> ‘readx_poll_timeout’
>>    61 |         ret = readx_poll_timeout(readl, PSP_REG(psp, 
>> PSP_STATUS_REG), ready,
>>       |               ^~~~~~~~~~~~~~~~~~
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> In file included from drivers/accel/amdxdna/aie2_pci.c:22:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> In file included from drivers/accel/amdxdna/aie2_ctx.c:18:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> In file included from drivers/accel/amdxdna/amdxdna_ctx.c:16:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> cc1: all warnings being treated as errors
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> drivers/accel/amdxdna/aie2_ctx.c: In function ‘aie2_hwctx_restart’:
>> drivers/accel/amdxdna/aie2_ctx.c:114:9: error: too few arguments to 
>> function ‘drm_sched_start’
>>   114 |         drm_sched_start(&hwctx->priv->sched);
>>       |         ^~~~~~~~~~~~~~~
>> In file included from ./include/trace/events/amdxdna.h:12,
>>                  from drivers/accel/amdxdna/aie2_ctx.c:13:
>> ./include/drm/gpu_scheduler.h:593:6: note: declared here
>>   593 | void drm_sched_start(struct drm_gpu_scheduler *sched, int errno);
>>       |      ^~~~~~~~~~~~~~~
>> make[5]: *** [scripts/Makefile.build:229: 
>> drivers/accel/amdxdna/aie2_psp.o] Error 1
>> make[5]: *** Waiting for unfinished jobs....
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> In file included from drivers/accel/amdxdna/amdxdna_pci_drv.c:18:
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>> declaration of function ‘FIELD_GET’ 
>> [-Werror=implicit-function-declaration]
>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>       |                ^~~~~~~~~
>> cc1: all warnings being treated as errors
>> make[5]: *** [scripts/Makefile.build:229: 
>> drivers/accel/amdxdna/aie2_ctx.o] Error 1
>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>       |                        ^~~~~~~~~~
>> drivers/accel/amdxdna/amdxdna_mailbox.c: In function 
>> ‘xdna_mailbox_send_msg’:
>> drivers/accel/amdxdna/amdxdna_mailbox.c:444:26: error: implicit 
>> declaration of function ‘FIELD_PREP’ 
>> [-Werror=implicit-function-declaration]
>>   444 |         header->sz_ver = FIELD_PREP(MSG_BODY_SZ, 
>> msg->send_size) |
>>       |                          ^~~~~~~~~~
>>
>>
>> You also have the following checkpatch issues -
> 
> Could you share the command you used?  I tried to use 'dim checkpatch' 
> and it did not find out the misspelling issue.

./scripts/checkpatch.pl --strict --codespell *.patch

Note, --codespell requires some local setup.  I beleive the comments in 
the checkpatch.pl script are fairly straightforward.  I use a copy of 
the database from the github that is rather recent.  The Ubuntu distro 
package is really out of date and I don't think I looked to see if there 
is a pythong pip version.  Grabbing the one file from the github repo 
seemed simple emough.

-Jeff

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ